Accession Number:

ADA455426

Title:

HARD Track Overview in TREC 2004: High Accuracy Retrieval from Documents

Descriptive Note:

Research paper

Corporate Author:

MASSACHUSETTS UNIV AMHERST CENTER FOR INTELLIGENT INFORMATION RETRIEVAL

Personal Author(s):

Report Date:

2004-01-01

Pagination or Media Count:

12.0

Abstract:

The High Accuracy Retrieval from Documents HARD track explores methods for improving the accuracy of document retrieval systems. It does so by considering three questions. Can additional metadata about the query, the searcher, or the context of the search provide more focused and, therefore, more accurate results These metadata items generally do not directly affect whether or not a document is on topic, but they do affect whether it is relevant. For example, a person looking for introductory material will not find an on-topic but highly technical document relevant. Can highly focused, short-duration, interaction with the searcher be used to improve the accuracy of a system Participants created clarification forms generated in response to a query -- and leveraging any information available in the corpus -- that were filled out by the searcher. Typical clarification questions might ask whether some titles seem relevant, whether some words or names are on topic, or whether a short passage of text is related. Can passage retrieval be used to effectively focus attention on relevant material, increasing accuracy by eliminating unwanted text in an otherwise useful document For this aspect of the problem, there are challenges in finding relevant passages, but also in determining how best to evaluate the results. The HARD track ran for the second time in TREC 2004. It used a new corpus and a new set of 50 topics for evaluation. All topics included metadata information and clarification forms were considered for each of them. Because of the expense of sub-document relevance judging, only half of the topics were used in the passage-level evaluation. A total of 16 sites participated in HARD, up from 14 sites the year before. Interest remains strong, so the HARD track will run again in TREC 2005, but because of funding uncertainties will only address a subset of the issues.

Subject Categories:

  • Information Science

Distribution Statement:

APPROVED FOR PUBLIC RELEASE