Accession Number:

ADA440512

Title:

Scenario Customization for Information Extraction

Descriptive Note:

Doctoral thesis

Corporate Author:

DEFENSE ADVANCED RESEARCH PROJECTS AGENCY ARLINGTON VA

Personal Author(s):

Report Date:

2001-01-01

Pagination or Media Count:

148.0

Abstract:

Information Extraction IE is an emerging NLP technology, whose function is to process unstructured, natural language text, to locate specific pieces of information, or facts, in the text, and to use these facts to fill a database. IE systems today are commonly based on pattern matching. The core IE engine uses a cascade of sets of patterns of increasing linguistic complexity. Each pattern consists of a regular expression and an associated mapping from syntactic to logical form. The pattern sets are customized for each new topic, as defined by the set of facts to be extracted. Construction of a pattern base for a new topic is recognized as a time-consuming and expensive process--a principal roadblock to wider use of IE technology in the large.

Subject Categories:

  • Information Science

Distribution Statement:

APPROVED FOR PUBLIC RELEASE