An Automatically Generated Lexical Knowledge Base with Soft Definitions

Title: An Automatically Generated Lexical Knowledge Base with Soft Definitions
Authors: Scaiano, Martin
Date: 2016
Abstract: There is a need for methods that understand and represent the meaning of text for use in Artificial Intelligence (AI). This thesis demonstrates a method to automatically extract a lexical knowledge base from dictionaries for the purpose of improving machine reading. Machine reading refers to a process by which a computer processes natural language text into a representation that supports inference or inter-connection with existing knowledge (Clark and Harrison, 2010).1 There are a number of linguistic ideas associated with representing and applying the meaning of words which are unaddressed in current knowledge representations. This work draws heavily from the linguistic theory of frame semantics (Fillmore, 1976). A word is not a strictly defined construct; instead, it evokes our knowledge and experiences, and this information is adapted to a given context by human intelligence. This can often be seen in dictionaries, as a word may have many senses, but some are only subtle variations of the same theme or core idea. Further unaddressed issue is that sentences may have multiple reasonable and valid interpretations (or readings). This thesis postulates that there must be algorithms that work with symbolic rep- resentations which can model how words evoke knowledge and then contextualize that knowledge. I attempt to answer this previously unaddressed question, “How can a sym- bolic representation support multiple interpretations, evoked knowledge, soft word senses, and adaptation of meaning?” Furthermore, I implement and evaluate the proposed so- lution. This thesis proposes the use of a knowledge representation called Multiple Interpre- tation Graphs (MIGs), and a lexical knowledge structure called auto-frames to support contextualization. MIG is used to store a single auto-frame, the representation of a sen- tence, or an entire text. MIGs and auto-frames are produced from dependency parse trees using an algorithm I call connection search. MIG supports representing multiple different interpretations of a text, while auto-frames combine multiple word senses and in- formation related to the word into one representation. Connection search contextualizes MIGs and auto-frames, and reduces the number of interpretations that are considered valid. In this thesis, as proof of concept and evaluation, I extracted auto-frames from Long- man Dictionary of Contemporary English (LDOCE). I take the point of view that a word’s meaning depends on what it is connected to in its definition. I do not use a 1The term machine reading was coined by Etzioni et al. (2006). ii  predetermined set of semantic roles; instead, auto-frames focus on the connections or mappings between a word’s context and its definitions. Once I have extracted the auto-frames, I demonstrate how they may be contextu- alized. I then apply the lexical knowledge base to reading comprehension. The results show that this approach can produce good precision on this task, although more re- search and refinement is needed. The knowledge base and source code is made available to the community at or by contacting
CollectionThèses, 2011 - // Theses, 2011 -