Accession Number:

ADA618578

Title:

Towards a Simple and Efficient Web Search Framework

Descriptive Note:

Conference paper

Corporate Author:

CARNEGIE-MELLON UNIV PITTSBURGH PA LANGUAGE TECHNOLOGIES INST

Personal Author(s):

Report Date:

2014-11-01

Pagination or Media Count:

8.0

Abstract:

The Web Track of 2014 Text REtrieval Conference TREC addresses the most fundamental problem of Information Retrieval. We did not intend to craft a system that beats the state-of-the-art search engines, but to design a light weight and cost-effective system with comparable performances. We introduce a twopass retrieval framework, with the first pass consisting of a simple and efficient retrieval model that focuses on recall, and the second pass a wave of feature extraction algorithms run on the set of top ranked documents, followed by Learning to Rank LETOR algorithms that provide different precision oriented rankings, and their outputs are combined using data fusion. We have focused on using statistical Language Models with novel and well-known smoothing techniques, different LETOR methods and various data fusion techniques. In addition, we have also tried using topic modelling with Hierarchical Dirichlet Allocation for query expansion in the hope of improving diversity of our results. However, the topic modelling approach has turned out to be unsuccessful, and we have not been able to spot the problem and benefit from it in this work. In addition we also present some further analyses demonstrating that our approach is robust against overfitting, and some general studies on overfitting in the context of LETOR.

Subject Categories:

  • Information Science
  • Cybernetics

Distribution Statement:

APPROVED FOR PUBLIC RELEASE