Repository logo

"Roget's Thesaurus" as a lexical resource for natural language processing

Loading...
Thumbnail ImageThumbnail Image

Date

Journal Title

Journal ISSN

Volume Title

Publisher

University of Ottawa (Canada)

Abstract

This dissertation presents an implementation of an electronic lexical knowledge base that uses the 1987 Penguin edition of Roget's Thesaurus as the source for its lexical material---the first implementation of a computerized Roget's to use an entire current edition. It explains the steps necessary for taking a machine-readable file and transforming it into a tractable system. Roget's organization is studied in detail and contrasted with WordNet's. We show two applications of the computerized Thesaurus: computing semantic similarity between words and phrases, and building lexical chains in a text. The experiments are performed using well-known benchmarks and the results are compared to those of other systems that use Roget's, WordNet and statistical techniques. Roget's has turned out to be an excellent resource for measuring semantic similarity; lexical chains are easily built but more difficult to evaluate. We also explain ways in which Roget's Thesaurus and WordNet can be combined.

Description

Keywords

Citation

Source: Masters Abstracts International, Volume: 42-06, page: 2233.

Related Materials

Alternate Version