Accession Number:

ADA458883

Title:

MUC-4 Test Results and Analysis

Descriptive Note:

Conference paper

Corporate Author:

LANGUAGE SYSTEMS INC WOODLAND HILLS CA

Report Date:

1992-01-01

Pagination or Media Count:

6.0

Abstract:

LSIs overall natural language processing NLP objective is the development of a broad coverage, reusable system which is readily transportable to additional domains, applications, and sublanguages in English, as well as providing a foundation for our multilingual work . Our system, called DBG, for Data Base Generator, is comprised of a set of NLP components which have been developed, extended, and rebuilt over a period of some years. The core of the system is an innovative Principle-based parser, using ideas from 1, which we began developing in the course of MUC-3 to replace our previous chart parser. Our approach thus relies on the concept of powerful, robust parsing as the most crucial component in an NLP system . In applying our NLP system to text extraction, our ultimate objective is to develop a high quality text extraction system, where high quality is defined as scoring above 80 -- a number well beyond any current MUC scores. In line with these NLP objectives, our major focus for MUC-4 was a follow-up to our main lesson learned in MUC-3, which was to acquire a machine-readable dictionary MRD and integrate its content into the DBG system. When attempts to acquire the computer-friendly Longmans or one of the Oxford Dictionaries were unsuccessful, we turned to ACLs CD-ROM containing the Collins English Dictionary . The most correct version of the CED on the ACL CD-ROM was apparently developed directly from a medium prepared for the typographer , and unfortunately lacks any documentation of features, fonts, language, etc . The effort of acquiring an d integrating the CED was clearly a worthwhile endeavor, since we were able to increase the number of entries i n our lexicon three-fold in a relatively short time see Table 1 . The increase in lexicon size will benefit all the applications LSI is currently working on.

Subject Categories:

  • Information Science
  • Linguistics

Distribution Statement:

APPROVED FOR PUBLIC RELEASE