Measuring Robustness with First Relevant Score in the TREC 2012 Microblog Track
OPEN TEXT CORPORATION OTTAWA (ONTARIO)
Pagination or Media Count:
In this paper, we measure the effectiveness of various experimental search techniques not just with traditional TREC ad hoc search measures such as Average Precision, R-precision and Precision at 30, but also with robust measures based on just the rank of the first relevant item retrieved such as First Relevant Score and Generalized Success at 30. We report the results of our experiments conducted in the context of the Real-Time Adhoc Search Task of the TREC 2012 Microblog Track which investigated the effectiveness of ad hoc search of a collection of more than 10 million tweets. For the experimental technique of favoring tweets with urls, we found that both the traditional and robust measures indicated statistically significant increases in the mean score. However, for an experimental blind feedback technique, a technique known to be non-robust as it typically makes poor results even worse, the traditional Average Precision measure indicated a statistically significant increase in the mean score, but some of the measures just based on the rank of the first relevant item successfully discerned a statistically significant decrease in the mean score from the non-robust technique.
- Information Science