Compact features for sentiment analysis

Gaudette, Lisa

Compact features for sentiment analysis

Fichiers

MR61163.PDF (3.96 MB)

Date

2009

Authors

Gaudette, Lisa

Éditeur

University of Ottawa (Canada)

Résumé

This work examines a novel method of developing features to use for machine learning of sentiment analysis and related tasks. This task is frequently approached using a Bag of Words representation -- one feature for each word encountered in the training data -- which can easily number in the thousands or tens of thousands. This thesis develops a set of "numeric" features, by learning scores for words, dividing the range of possible scores into a number of bins, and then generating features based on counting how many words in each document have scores in each bin. This allows for effective learning of sentiment and related tasks with 25 features; in fact, performance was very often slightly better with these features. This reduction in the number of features allows for the processing of much larger collections of texts than previously attempted. In addition, we carefully consider the problem of evaluating ordinal problems.

Citation

Source: Masters Abstracts International, Volume: 48-06, page: 3709.

URI

http://hdl.handle.net/10393/28295
http://dx.doi.org/10.20381/ruor-19182

Collections

Thèses, 1910 - 2010 // Theses, 1910 - 2010

Notice complète

Compact features for sentiment analysis

Fichiers

Date

Authors

Nom de la revue

ISSN de la revue

Titre du volume

Éditeur

Résumé

Description

Mots-clés

Citation

URI

Collections

Approbation

Évaluation

Complété par

Référencé par