Exploring word and sentence similarity in corpus
Loading...
Files
Date
Authors
Journal Title
Journal ISSN
Volume Title
Publisher
University of Ottawa (Canada)
Abstract
This research addresses the problem of deriving semantic similarity between words of language using corpora and contextual distributions comparison methods. It aims to capture, in a comprehensive way, the similar behavior of words and henceforth properly estimates the semantic similarity between words of the language. The framework proposed for this purpose is incremental and iterative. The system combines the Edit distance and the incremental results as a way for accurate similarity measure. Moreover, a sentence similarity system is developed on top of the word similarity model. Naturally, the proposed model rests on observing the words behavior in large amount of natural text.
As for the strategy followed in this thesis, we first examine existing similarity measures, their hypotheses and show how these measures unfortunately fail to account for some linguistic features for estimating words similarity when they come under fine scrutiny. Furthermore, we present a model to enhance these measures to take into account linguistic characteristics.
Indeed, the suggested model takes large amount of raw data as input, extracts distributions of contexts and infers accordingly similarity between words using these distributions and Normalized Edit distance (NED). (Abstract shortened by UMI.)
Description
Keywords
Citation
Source: Masters Abstracts International, Volume: 43-06, page: 2371.
