Incorporating Non-Relevance Information in the Estimation of Query Models
AMSTERDAM UNIV (NETHERLANDS)
Pagination or Media Count:
The authors describe the participation of the University of Amsterdams Information and Language Processing Systems ILPS group in the Relevance Feedback track at TREC 2008. They introduce a new model which incorporates information from relevant and nonrelevant documents to improve the estimation of query models. The study attempts to answer three research questions. First, can nonrelevance information be effectively modeled to improve the estimation of a query model Second, given our model, what is the effect of the relative size of the set of nonrelevant documents with respect to the relevant documents on retrieval effectiveness And, third, we ask the question whether and when explicit nonrelevance information helps. In other words, what are the effects when we substitute the estimates on the nonrelevant documents with more general estimates, such as from the collection The model we propose leverages the distance between each relevant document and the set of nonrelevant documents by penalizing terms that occur frequently in the latter, similar to the intuitions described by Wang et al. 2008. Instead of subtracting probabilities, however, we take a more principled approach based on the Normalized Log Likelihood Ratio NLLR. Their main findings are twofold 1 in terms of statMAP, a larger number of judged to be nonrelevant documents improves retrieval effectiveness and 2 on the TREC Terabyte topics, they can effectively replace the estimates on the judged to be nonrelevant documents with estimations on the document collection.
- Information Science
- Statistics and Probability