Effective Retrieval with Distributed Collections
MASSACHUSETTS UNIV AMHERST DEPT OF COMPUTER SCIENCE
Pagination or Media Count:
This paper evaluates the retrieval effectiveness of distributed information retrieval systems in realistic environments. We find that when a large number of collections are available, the retrieval effectiveness is significantly worse than that of centralized systems, mainly because typical queries are not adequate for the purpose of choosing the right collections. We propose two techniques to address the problem. One is to use phrase information in the collection selection index and the other is query expansion. Both techniques enhance the discriminatory power of typical queries for choosing the right collections and hence significantly improve retrieval results. Query expansion, in particular, brings the effectiveness of searching a large set of distributed collections close to that of searching a centralized collection.
- Computer Systems