Accession Number:

ADA460210

Title:

Automatic Pattern Acquisition for Japanese Information Extraction

Descriptive Note:

Corporate Author:

NEW YORK UNIV NY DEPT OF COMPUTER SCIENCE

Report Date:

2001-01-01

Pagination or Media Count:

8.0

Abstract:

One of the central issues for information extraction is the cost of customization from one scenario to another. Research on the automated acquisition of patterns is important for portability and scalability. In this paper, we introduce Tree-Based Pattern representation where a pattern is denoted as a path in the dependency tree of a sentence. We outline the procedure to acquire Tree-Based Patterns in Japanese from un-annotated text. The system extracts the relevant sentences from the training data based on TFIDF scoring and the common paths in the parse tree of relevant sentences are taken as extracted patterns.

Subject Categories:

  • Information Science
  • Cybernetics

Distribution Statement:

APPROVED FOR PUBLIC RELEASE