Accession Number:

ADA383044

Title:

Ontology-Based Information Extraction from Free-Form Text

Descriptive Note:

Final rept. 30 Mar-Sep 2000

Corporate Author:

STOTTLER HENKE ASSOCIATES INC SAN MATEO CA

Personal Author(s):

Report Date:

2000-10-06

Pagination or Media Count:

50.0

Abstract:

Report developed under SBIR contract. In this Phase I SBIR research we demonstrated the feasibility of an information extraction IE system that can leverage semantic representations to significantly increase end-to-end recall for the IE task while maintaining or improving precision. Our end-to-end Ontology-Based IE OBIE system combines machine learning techniques with a novel architecture built around a shared domain ontology. This architecture enables interaction between different levels of the IE processing stream simultaneously through the shared ontology. By incorporating hierarchical knowledge in their learning algorithms, IE modules can perform their extraction tasks with greater depth and accuracy. Bootstrapping algorithms were extended to automatically learn the ontology of a new domain, to assist in training the IE components, and to reduce the burden of annotation on the end-user. Broad-coverage and rare-case extraction rules were augmented by classifiers induced from the trained ontology to shore up the precision typically lost by such rules. Performance metrics allow a preliminary characterization of recall and precision gains enabled by the proposed architecture. Our Phase I research and development of a proof-of-concept prototype demonstrated the feasibility and utility of OBIEs ontology-based IE capability and provides a solid foundation for our Phase implementation.

Subject Categories:

  • Information Science

Distribution Statement:

APPROVED FOR PUBLIC RELEASE