Accession Number:

ADA470538

Title:

Natural Language Information Retrieval: TREC-4 Report

Descriptive Note:

Conference proceedings

Corporate Author:

GE CORPORATE RESEARCH AND DEVELOPMENT SCHENECTADY NY

Report Date:

1995-11-01

Pagination or Media Count:

16.0

Abstract:

In this paper we report on the joint GENYU natural language information retrieval project as related to the 4th Text Retrieval Conference TREC-4. The main thrust of this project is to use natural language processing techniques to enhance the effectiveness of full-text document retrieval. During the course of the four TREC conferences, we have built a prototype IR system designed around a statistical full-text indexing and search backbone provided by the NISTs Prise engine. The original Prise has been modified to allow handling of multi-word phrases, differential term weighting schemes, automatic query expansion, index partitioning and rank merging, as well as dealing with complex documents. Natural language processing is used to 1 preprocess the documents in order to extract content-carrying terms, 2 discover inter-term dependencies and build a conceptual hierarchy specific to the database domain, and 3 process users natural language requests into effective search queries. The overall architecture of the system is essentially the same as in TREC-3, as our efforts this year were directed at optimizing the performance of all components. A notable exception is the new massive query expansion module used in routing experiments, which replaces prototype extension used in the TREC-3 system. On the other hand, it has to be noted that the character and the level of difficulty of TREC queries has changed quite significantly since the last year evaluation. TREC-4 new ad-hoc queries are far shorter, less focused, and they have a flavor of information requests What is the prognosis of ... rather than search directives typical for earlier TRECs The relevant document will contain .... This makes building of good search queries a more sensitive task than before. We thus decided to introduce only minimum number of changes to our indexing and search processes,

Subject Categories:

  • Information Science
  • Linguistics

Distribution Statement:

APPROVED FOR PUBLIC RELEASE