Accession Number:

ADA458576

Title:

UMass/Hughes: Description of the Circus System Used for MUC-5

Descriptive Note:

Corporate Author:

MASSACHUSETTS UNIV AMHERST DEPT OF COMPUTER SCIENCE

Report Date:

1993-01-01

Pagination or Media Count:

16.0

Abstract:

The primary goal of our effort is the development of robust and portable language processing capabilities for information extraction applications. The system under evaluation here is based on language processing components that have demonstrated strong performance capabilities in previous evaluations Lehnert et al. 1992a. Having demonstrated the general viability of these techniques, we are now concentrating on the practicality of our technology by creating trainable system components to replace hand-coded data and manually-engineered software. Our general strategy is to automate the construction of domain-specific dictionaries and other language- related resources so that information extraction can be customized for specific applications with a minimal amount of human assistance. We employ a hybrid system architecture that combines selective concept extraction Lehnert 1991 technologies developed at UMass with trainable classifier technologies developed at Hughes Dolan et al. 1991. Our MUC-5 system incorporates seven trainable language components to handle 1 lexical recognition and part-of-speech tagging, 2 knowledge of semanticsyntactic interactions, 3 semantic feature Lagging, 4 noun phrase analysis, 5 limited conference resolution, 6 domain object recognition, and 7 relational link recognition. Our trainable components have been developed so domain experts who have no background in natural language or machine learning can train individual system components in the space of a few hours.

Subject Categories:

  • Information Science
  • Linguistics

Distribution Statement:

APPROVED FOR PUBLIC RELEASE