Semiautomatic recognition of semantic relationships in English technical texts.

Barker, Ken.Semiautomatic recognition of semantic relationships in English technical texts.University of Ottawa (Canada)1998Language, General.Université d'Ottawa | University of OttawaUniversité d'Ottawa | University of OttawaSzpakowicz, S.,2009-03-192009-03-1919981998ThesisSource: Dissertation Abstracts International, Volume: 60-03, Section: B, page: 1157.9780612367630http://hdl.handle.net/10393/4443http://dx.doi.org/10.20381/ruor-13860193 p.When people read a text, they rely on a priori knowledge of language, common sense knowledge and knowledge of the domain. Many natural language processing systems implement this human model of language understanding, and therefore are heavily knowledge-dependent. Such systems assume the availability of large amounts of background knowledge coded in advance in a specialized formalism. The problem with such an assumption is that building a knowledge base with sufficient and relevant content is labour-intensive and very costly. And often, the resulting knowledge is either too specific to be used for more than one very narrow domain or too general to allow subtle analyses of texts. In order to avoid the problems of manually encoding background knowledge, many researchers have abandoned symbolic language analysis in favour of statistical methods. The availability of large online corpora and improvements in computing resources have made it possible to make predictions about meaning based on observations of frequencies, contexts, correlation, and other phenomena in a corpus. Systems that use statistical methods have had some impressive successes, notably in part of speech tagging, word class clustering and word sense disambiguation. But these systems often require large amounts of analyzed language data to arrive at even shallow interpretations. Both of these kinds of natural language processing systems seek models of a text---knowledge-intensive systems a deep semantic model, corpus-based systems a much shallower distributional one. And both kinds of system depend on outside sources of data. This dissertation describes the construction and evaluation of an interactive tool that also seeks a model of a text. The model takes the form of semantic relationships between syntactic elements in English sentences. The system also depends on an outside source of data: a cooperative user. Unlike knowledge-intensive and corpus-based systems, however, it does not require a large repository of semantic information and it does not require any previously analyzed data: it can start processing a text from scratch. The system inspects the surface syntax of a sentence to make informed decisions about its possible interpretations. It then suggests these interpretations to the user. As more text is analyzed, the system learns from previous analyses to make better decisions, reducing its reliance on the user. Evaluation confirms that the semi-automatic acquisition of the model of a text is relatively painless for the user. The regular structure of the model identifies concepts that have different surface-syntactic forms. These concepts could be used as the knowledge base for expert systems or query answering systems. They could be used as a conceptual profile of a text, allowing, for example, text indexing on semantic concepts instead of just keywords. The concepts and semantic relationships between them could serve as base structures for text summarization. They could also be used as the domain-specific background knowledge core for natural language processing systems that attempt deeper understanding of a text.