The role of named entities in text classification
Loading...
Date
Authors
Journal Title
Journal ISSN
Volume Title
Publisher
University of Ottawa (Canada)
Abstract
Named entities, typically associated with names of people, places and organizations, constitute a group of textual elements present in almost any type of document. The general techniques used to extract them and their variable-length property also makes them an attractive type of attribute to study in text classification. In this thesis, several datasets are characterized as being either dependent or independent of named entities with a Naive Bayes based ranking technique. Using this characterization, results are presented which find named entities to be in fact useful in classification tasks, and that accuracy can be improved by considering them as a special type of attribute. Namely, the inclusion of regular terms, named entity representation and the frequency with which a classifier is retrained all have an impact on the classification of documents where named entities are important.
Description
Keywords
Citation
Source: Masters Abstracts International, Volume: 44-04, page: 1919.
