Repository logo

Exploring word and sentence similarity in corpus

dc.contributor.authorZaki, Souhail
dc.date.accessioned2013-11-07T18:11:53Z
dc.date.available2013-11-07T18:11:53Z
dc.date.created2004
dc.date.issued2004
dc.degree.levelMasters
dc.degree.nameM.A.Sc.
dc.description.abstractThis research addresses the problem of deriving semantic similarity between words of language using corpora and contextual distributions comparison methods. It aims to capture, in a comprehensive way, the similar behavior of words and henceforth properly estimates the semantic similarity between words of the language. The framework proposed for this purpose is incremental and iterative. The system combines the Edit distance and the incremental results as a way for accurate similarity measure. Moreover, a sentence similarity system is developed on top of the word similarity model. Naturally, the proposed model rests on observing the words behavior in large amount of natural text. As for the strategy followed in this thesis, we first examine existing similarity measures, their hypotheses and show how these measures unfortunately fail to account for some linguistic features for estimating words similarity when they come under fine scrutiny. Furthermore, we present a model to enhance these measures to take into account linguistic characteristics. Indeed, the suggested model takes large amount of raw data as input, extracts distributions of contexts and infers accordingly similarity between words using these distributions and Normalized Edit distance (NED). (Abstract shortened by UMI.)
dc.format.extent99 p.
dc.identifier.citationSource: Masters Abstracts International, Volume: 43-06, page: 2371.
dc.identifier.urihttp://hdl.handle.net/10393/26821
dc.identifier.urihttp://dx.doi.org/10.20381/ruor-18388
dc.language.isoen
dc.publisherUniversity of Ottawa (Canada)
dc.subject.classificationEngineering, Electronics and Electrical.
dc.titleExploring word and sentence similarity in corpus
dc.typeThesis

Files

Original bundle

Now showing 1 - 1 of 1
Loading...
Thumbnail ImageThumbnail Image
Name:
MR01657.PDF
Size:
3.9 MB
Format:
Adobe Portable Document Format