Accession Number : ADA477505


Title :   Topic Models in Information Retrieval


Descriptive Note : Doctoral thesis


Corporate Author : MASSACHUSETTS UNIV AMHERST DEPT OF COMPUTER SCIENCE


Personal Author(s) : Wei, Xing


Full Text : https://apps.dtic.mil/dtic/tr/fulltext/u2/a477505.pdf


Report Date : Aug 2007


Pagination or Media Count : 145


Abstract : Topic modeling demonstrates the semantic relations among words, which should be helpful for information retrieval tasks. We present probability mixture modeling and term modeling methods to integrate topic models into language modeling framework for information retrieval. A variety of topic modeling techniques, including manually-built query models, term similarity measures and latent mixture models, especially Latent Dirichlet Allocation (LDA), a formal generative latent mixture model of documents, have been proposed or introduced into IR tasks. We investigated and evaluated them on several TREC collections within presented frameworks, and show that significant improvements over previous work can be obtained. Practical problems such as efficiency and scaling considerations are discussed and compared for different topic models. Other recent topic modeling techniques are also discussed.


Descriptors :   *MODELS , *SEMANTICS , *INFORMATION RETRIEVAL , THESES , CLUSTERING , WORDS(LANGUAGE) , VOCABULARY


Subject Categories : Information Science
      Linguistics


Distribution Statement : APPROVED FOR PUBLIC RELEASE