Information Extraction Overview

Okurowski, Mary E.

Information Extraction Overview

Active / Technical Report | Accession Number: ADA633427 |

Open PDF

Abstract:

The information explosion of the last decade has placed increasing demands on processing and analyzing large volumes of on-line data. In response, the Advanced Research Projects Agency ARPA has been supporting research to develop a new technology called information extraction. Information extraction is a type of document processing which captures and outputs factual information contained within a document. Similar to an information retrieval IR system, an information extraction system responds to a users information need. Whereas an IR system identifies a subset of documents in a large text database or in a library scenario a subset of resources in a library, an information extraction system identifies a subset of information within a document This subset of information is not necessarily a summary or gist of the contents of the document. Rather it corresponds to predefied generic types of information of interest and represents specific instances found in the text For example, a user of a system may be interested in identifying and databasing information on all companies named within a set of documents, including companies not previously known to the user. An information extraction system can extract and output all of the occurrences of company names within a text with an accuracy of 75. Moreover, it is possible to specify that the system only extract those companies of a certain type, such as Japanese companies or companies in the textile industry.

Author(s):

Okurowski, Mary E.

Author Organization(s):

NATIONAL COMPUTER SECURITY CENTER FORT GEORGE G MEADE MD

Descriptive Note:

Conference paper

Supplementary Note:

TIPSTER TEXT PROGRAM: PHASE I: Proceedings of a Workshop held at Fredericksburg, Virginia, September 19-23, 1993. Sponsored by the Advanced Research Projects Agency.

Pagination:

0006

Security Markings

DOCUMENT & CONTEXTUAL SUMMARY

Distribution:

Approved For Public Release

Distribution Statement:

Approved For Public Release; Distribution Is Unlimited.

RECORD

Collection: TR

Identifying Numbers

Monitor Series:

CSC

Subject Terms

Joint Capability Areas:

JCA_5_Command and Control; JCA_5.2_Understand; JCA_5.2.2_Develop Knowledge and Situational Awareness; JCA_5.5.2_Task; JCA_5.5_Direct; JCA_1_Force Support; JCA_1.2_Force Preparation; JCA_1.2.3_Educating; JCA_8_Building Partnerships; JCA_5.3_Planning; JCA_1.2.6_Concepts; JCA_8.2_Shape

Communities of Interest:

Materials and Manufacturing Processes

Descriptor(s):

*EXTRACTION, *INFORMATION RETRIEVAL, ACCURACY, INFORMATION SYSTEMS, LIBRARIES, OUTPUT, TEXT PROCESSING

Field(s)/Group(s):

Information Science

Keyword(s):

TIPSTER TEXT PROGRAM

Report Date:

1993 Sep 01

Creation Date:

2016 Jul 11