"Roget's Thesaurus" as a lexical resource for natural language processing

FieldValue
dc.contributor.advisorSzpakowocz, Stan,
dc.contributor.authorJarmasz, Mario
dc.date.accessioned2013-11-07T17:24:40Z
dc.date.available2013-11-07T17:24:40Z
dc.date.created2003
dc.date.issued2003
dc.identifier.citationSource: Masters Abstracts International, Volume: 42-06, page: 2233.
dc.identifier.urihttp://hdl.handle.net/10393/26493
dc.identifier.urihttp://dx.doi.org/10.20381/ruor-18213
dc.description.abstractThis dissertation presents an implementation of an electronic lexical knowledge base that uses the 1987 Penguin edition of Roget's Thesaurus as the source for its lexical material---the first implementation of a computerized Roget's to use an entire current edition. It explains the steps necessary for taking a machine-readable file and transforming it into a tractable system. Roget's organization is studied in detail and contrasted with WordNet's. We show two applications of the computerized Thesaurus: computing semantic similarity between words and phrases, and building lexical chains in a text. The experiments are performed using well-known benchmarks and the results are compared to those of other systems that use Roget's, WordNet and statistical techniques. Roget's has turned out to be an excellent resource for measuring semantic similarity; lexical chains are easily built but more difficult to evaluate. We also explain ways in which Roget's Thesaurus and WordNet can be combined.
dc.format.extent220 p.
dc.language.isoen
dc.publisherUniversity of Ottawa (Canada)
dc.subject.classificationLanguage, Linguistics.
dc.subject.classificationArtificial Intelligence.
dc.subject.classificationComputer Science.
dc.title"Roget's Thesaurus" as a lexical resource for natural language processing
dc.typeThesis
dc.degree.nameM.C.S.
dc.degree.levelMasters
CollectionTh├Ęses, 1910 - 2010 // Theses, 1910 - 2010

Files
MQ90084.PDF11.32 MBAdobe PDFOpen