Exploring word and sentence similarity in corpus

Zaki, Souhail

Exploring word and sentence similarity in corpus

Fichiers

MR01657.PDF (3.9 MB)

Date

2004

Authors

Zaki, Souhail

Éditeur

University of Ottawa (Canada)

Résumé

This research addresses the problem of deriving semantic similarity between words of language using corpora and contextual distributions comparison methods. It aims to capture, in a comprehensive way, the similar behavior of words and henceforth properly estimates the semantic similarity between words of the language. The framework proposed for this purpose is incremental and iterative. The system combines the Edit distance and the incremental results as a way for accurate similarity measure. Moreover, a sentence similarity system is developed on top of the word similarity model. Naturally, the proposed model rests on observing the words behavior in large amount of natural text. As for the strategy followed in this thesis, we first examine existing similarity measures, their hypotheses and show how these measures unfortunately fail to account for some linguistic features for estimating words similarity when they come under fine scrutiny. Furthermore, we present a model to enhance these measures to take into account linguistic characteristics. Indeed, the suggested model takes large amount of raw data as input, extracts distributions of contexts and infers accordingly similarity between words using these distributions and Normalized Edit distance (NED). (Abstract shortened by UMI.)

Citation

Source: Masters Abstracts International, Volume: 43-06, page: 2371.

URI

http://hdl.handle.net/10393/26821
http://dx.doi.org/10.20381/ruor-18388

Collections

Thèses, 1910 - 2010 // Theses, 1910 - 2010

Notice complète

Exploring word and sentence similarity in corpus

Fichiers

Date

Authors

Nom de la revue

ISSN de la revue

Titre du volume

Éditeur

Résumé

Description

Mots-clés

Citation

URI

Collections

Approbation

Évaluation

Complété par

Référencé par