Repository logo

Compact features for sentiment analysis

Loading...
Thumbnail ImageThumbnail Image

Date

Journal Title

Journal ISSN

Volume Title

Publisher

University of Ottawa (Canada)

Abstract

This work examines a novel method of developing features to use for machine learning of sentiment analysis and related tasks. This task is frequently approached using a Bag of Words representation -- one feature for each word encountered in the training data -- which can easily number in the thousands or tens of thousands. This thesis develops a set of "numeric" features, by learning scores for words, dividing the range of possible scores into a number of bins, and then generating features based on counting how many words in each document have scores in each bin. This allows for effective learning of sentiment and related tasks with 25 features; in fact, performance was very often slightly better with these features. This reduction in the number of features allows for the processing of much larger collections of texts than previously attempted. In addition, we carefully consider the problem of evaluating ordinal problems.

Description

Keywords

Citation

Source: Masters Abstracts International, Volume: 48-06, page: 3709.

Related Materials

Alternate Version