Knowledge extraction technology for terminology.

dc.contributor.advisorMeyer, Ingrid,
dc.contributor.authorDavidson, Laura M.
dc.identifier.citationSource: Masters Abstracts International, Volume: 37-02, page: 0398.
dc.description.abstractTerminologists scan large amounts of specialized texts to discover the terms for the concepts in a given subject field and to extract knowledge-rich contexts. These contexts make explicit, by means of linguistic structures, the semantic relations that exist between the concepts. Developing the subject field's conceptual network is called concept analysis. To carry out concept analysis, many terminologists are still using paper-based corpora. Yet this is time-consuming and error-prone. This thesis explores a semi-automatic approach to concept analysis that involves electronic corpora and knowledge extraction technology. My research focused on a program called the Text Analyzer (TA), which I tested for its effectiveness in retrieving knowledge-rich contexts from French and English electronic corpora in the subject field of composting. I first discovered the linguistic patterns that French and English use to express three semantic relations. The TA was then programmed with these patterns to be able to extract knowledge-rich contexts from the corpora. I then tested the TA's extraction capabilities and prepared statistics showing its effectiveness. Analysing the test results revealed ways to enhance the TA's performance. As a small follow-up experiment, I added more patterns to the TA and again tested its extraction effectiveness, which was improved. Up to that point, my focus was on lexical patterns. As part of the follow-up experiment, I also performed an exploratory test of the potential of grammatical patterns for knowledge extraction. This research revealed that much work is still needed to produce highly effective knowledge extraction programs. Even so, the statistics were encouraging and showed this technology's potential for dramatically reducing the time terminologists spend scanning corpora.
dc.format.extent128 p.
dc.publisherUniversity of Ottawa (Canada)
dc.subject.classificationLanguage, Modern.
dc.titleKnowledge extraction technology for terminology.
CollectionTh├Ęses, 1910 - 2010 // Theses, 1910 - 2010

MQ32532.PDF5.34 MBAdobe PDFOpen