Repository logo

Influence of word sense disambiguation on text classification

dc.contributor.authorWidlak, Magdalena
dc.date.accessioned2013-11-07T17:26:07Z
dc.date.available2013-11-07T17:26:07Z
dc.date.created2004
dc.date.issued2004
dc.degree.levelMasters
dc.degree.nameM.C.S.
dc.description.abstractWord sense ambiguity is a pervasive characteristic of natural language. The discrimination of word senses, word sense disambiguation, is considered to be of prime importance for many areas involving computerized language analysis, from machine translation to information retrieval. Text classification, as a growing subfield of information retrieval, is also believed to suffer from the effects of word sense ambiguity. The purpose of this thesis was to evaluate how word sense disambiguation affects text classification. The intuitive hypothesis is that word sense disambiguation aids the task of text classification. In order to evaluate the influence of word sense disambiguation on text classification three different corpora of text documents were disambiguated manually. Classification of both original and corresponding disambiguated data was performed using four different classification systems employing four different learning approaches: decision trees (C5.0), decision rules induction (Ripper), Naive Bayes (Rainbow) and support vector machines (LibSVM). Results obtained from the classification were compared using various evaluation methods. The results do not support the stated hypothesis very strongly. In some cases word sense disambiguation improved the results of text classification, in other cases there was no improvement or the results were worse. The difference in classification results obtained on original and disambiguated data are in most cases insignificant, that is, even though there is a slight difference in average errors, we cannot conclude that this difference is statistically significant. Some general tendencies can be observed when it comes to performance of specific classification systems. We can also infer which of the corpora were "easier" to classify than other.
dc.format.extent79 p.
dc.identifier.citationSource: Masters Abstracts International, Volume: 43-06, page: 2295.
dc.identifier.urihttp://hdl.handle.net/10393/26808
dc.identifier.urihttp://dx.doi.org/10.20381/ruor-18381
dc.language.isoen
dc.publisherUniversity of Ottawa (Canada)
dc.subject.classificationComputer Science.
dc.titleInfluence of word sense disambiguation on text classification
dc.typeThesis

Files

Original bundle

Now showing 1 - 1 of 1
Loading...
Thumbnail ImageThumbnail Image
Name:
MR01643.PDF
Size:
6.06 MB
Format:
Adobe Portable Document Format