Distributed EDLSI, BM25, and Power Norm at TREC 2008
URSINUS COLLEGE COLLEGEVILLE PA DEPT OF MATHEMATICS AND COMPUTER
Pagination or Media Count:
This paper describes our participation in the TREC Legal competition in 2008. Our first set of experiments involved the use of Latent Semantic Indexing LSI with a small number of dimensions, a technique we refer to as Essential Dimensions of Latent Semantic Indexing EDLSI. Because the experimental dataset is large, we designed a distributed version of EDLSI to use for our submitted runs. We submitted two runs using distributed EDLSI, one with k 10 and another with k 41, where k is the dimensionality reduction parameter for LSI. We also submitted a traditional vector space baseline for comparison with the EDLSI results. This article describes our experimental design and the results of these experiments. We find that EDLSI clearly outperforms traditional vector space retrieval using a variety of TREC reporting metrics. We also describe experiments that were designed as a followup to our TREC Legal 2007 submission. These experiments test weighting and normalization schemes as well as techniques for relevance feedback. Our primary intent was to compare the BM25 weighting scheme to our power normalization technique. BM25 outperformed all of our other submissions on the competition metric F1 at K for both the ad hoc and relevance feedback tasks, but Power normalization outperformed BM25 in our ad hoc experiments when the 2007 metric estimated recall at B was used for comparison.
- Numerical Mathematics
- Computer Programming and Software