Repository logo

An evaluation of a rule-based parser of English sentences.

Loading...
Thumbnail ImageThumbnail Image

Date

Journal Title

Journal ISSN

Volume Title

Publisher

University of Ottawa (Canada)

Abstract

DIPETT (Domain Independent Parser of English Technical Text) is a broad-coverage parser of English technical text that is used primarily in the TANKA (Text Analysis for Knowledge Acquisition) project. The TANKA project seeks to build a model of a technical domain by semi-automatically processing written text that describes the domain. No other source of domain-specific knowledge is available. The accuracy and completeness of a semantic representation generated by TANKA is partly determined by the accuracy of DIPETT's syntactic analysis of the text. The thesis argues that a test suite for a broad coverage natural language parser must necessarily be systematic, broad in its coverage of phenomena tested, and corpus-like in its coverage of phenomenon interaction. A test suite of example sentences extracted from Quirk et. al.'s comprehensive English grammar is proposed, and the results of evaluating DIPETT on that suite are compared with the evaluation results on a publicly available test suite, TSNLP (Test Suites for Natural Language Processing). (Abstract shortened by UMI.)

Description

Keywords

Citation

Source: Masters Abstracts International, Volume: 39-05, page: 1409.

Related Materials

Alternate Version