Title: | "Roget's Thesaurus" as a lexical resource for natural language processing |
Authors: | Jarmasz, Mario |
Date: | 2003 |
Abstract: | This dissertation presents an implementation of an electronic lexical knowledge base that uses the 1987 Penguin edition of Roget's Thesaurus as the source for its lexical material---the first implementation of a computerized Roget's to use an entire current edition. It explains the steps necessary for taking a machine-readable file and transforming it into a tractable system. Roget's organization is studied in detail and contrasted with WordNet's.
We show two applications of the computerized Thesaurus: computing semantic similarity between words and phrases, and building lexical chains in a text. The experiments are performed using well-known benchmarks and the results are compared to those of other systems that use Roget's, WordNet and statistical techniques. Roget's has turned out to be an excellent resource for measuring semantic similarity; lexical chains are easily built but more difficult to evaluate. We also explain ways in which Roget's Thesaurus and WordNet can be combined. |
URL: | http://hdl.handle.net/10393/26493 http://dx.doi.org/10.20381/ruor-18213 |
Collection | Thèses, 1910 - 2010 // Theses, 1910 - 2010
|