Overview of the TREC 2009 Web Track
WATERLOO UNIV (ONTARIO)
Pagination or Media Count:
The TRECWeb Track explores and evaluates Web retrieval technologies. Currently, the Web Track conducts experiments using the new billion-page ClueWeb09 collection. The TREC 2009 Web Track includes both a traditional ad hoc retrieval task and a new diversity task. The goal of this diversity task is to return a ranked list of pages that together provide complete coverage for a query, while avoiding excessive redundancy in the result list. Topics for the track were created from the logs of a commercial search engine, with the aid of tools developed at Microsoft Research. Given a target query, these tools extracted and analyzed groups of related queries, using co-clicks and other information, to identify clusters of queries that highlight different aspects and interpretations of the target query. These clusters were employed by NIST for topic development. Each resulting topic is structured as a representative set of subtopics, each related to a different user need. Documents were judged with respect to the subtopics, as well as with respect to the topic as a whole. For each subtopic, NIST assessors made a binary judgment as to whether or not the document satisfies the information need associated with the subtopic. These topics were used for both the ad hoc task and the diversity task. A total of 26 groups submitted runs to the track, with many groups participating in both tasks. This report provides an overview of the track, including topic development, evaluation measures, and results.
- Information Science
- Computer Programming and Software