LIA at TREC 2012 Web Track: Unsupervised Search Concepts Identification from General Sources of Information
LABORATOIRE INFORMATIQUE AVIGNON (FRANCE)
Pagination or Media Count:
In this paper, we report the experiments we conducted for our participation to the TREC 2012 Web Track. We experimented a brand new system that models the latent concepts underlying a query. We use Latent Dirichlet Allocation LDA, a generative probabilistic topic model, to exhibit highly-specific query-related topics from pseudo-relevant feedback documents. We define these topics as the latent concepts of the user query. Our approach automatically estimates the number of latent concepts as well as the needed amount of feedback documents, without any prior training step. These concepts are incorporated into the ranking function with the aim of promoting documents that refer to many different query-related thematics. We also explored the use of different types of sources of information for modeling the latent concepts. For this purpose, we use four general sources of information of various nature web, news, encyclopedic from which the feedback documents are extracted.
- Information Science