Repository logo

Automated acquisition of technical concepts from unrestricted English text using noun phrase classification.

Loading...
Thumbnail ImageThumbnail Image

Date

Journal Title

Journal ISSN

Volume Title

Publisher

University of Ottawa (Canada)

Abstract

This thesis describes an approach to acquire technical concepts from an English language free text without use of knowledge specific to the domain of expertise described in the text. Only syntactic knowledge and text statistics are used to classify each Noun Phrase in the text into one of five categories of technicality, from Technical to Not Technical. The algorithms devised and their performance are discussed. A secondary topic addressed in this thesis is syntactic category disambiguation. Because the Noun Phrase Classification module requires a Sentence Parser to extract the syntactic structure of each sentence in the text, the syntactic category (noun, verb, preposition, and so on) of each word must appear in the Sentence Parser's Word Dictionary. A syntactic category disambiguation module was designed so that whenever an unknown word (a word which is not defined in the Word Dictionary) is encountered in the text, the disambiguation module attempts to determine its syntactic category automatically using the categories of the neighbouring words with a bottom-up chart parser and text statistics.

Description

Keywords

Citation

Source: Masters Abstracts International, Volume: 32-05, page: 1417.

Related Materials

Alternate Version