IRRA at TREC 2009: Index Term Weighting based on Divergence From Independence Model
MUGLA UNIV (TURKEY) DEPT OF STATISTICS
Pagination or Media Count:
IRRA IR-Ra group participated in the 2009 Web track both adhoc task and diversity task and the Million Query track. In this year, the major concern is to examine the effectiveness of a novel, nonparametric index term weighting model, divergence from independence DFI. The notion of independence, which is the notion behind the well-known statistical exploratory data analysis technique called the correspondence analysis Greenacre, 1984 Jambu, 1991, can be adapted to the index term weighting problem. In this respect, it can be thought of as a qualitative description of the importance of terms for documents, in which they appear, importance in the sense of contribution to the information contents of documents relative to other terms. According to the independence notion, if the ratios of the frequencies of two different terms are the same across documents, they are independent from documents. For example, each Web page contains a pair of html and a pair of body tags, so that the ratio of frequencies of these tags is the same across all Web pages, indicating that the html and body tags are independent from Web pages.
- Information Science
- Computer Programming and Software
- Test Facilities, Equipment and Methods