DID YOU KNOW? DTIC has over 3.5 million final reports on DoD funded research, development, test, and evaluation activities available to our registered users. Click HERE
to register or log in.
Effective Structured Query Formulation for Session Search
GEORGETOWN UNIV WASHINGTON DC DEPT OF COMPUTER SCIENCE
Pagination or Media Count:
In this work, we emphasize on formulating effective structured queries for session search. For a given query, phrase-like text nuggets are identified and formulated into Lemur queries to feed into the Lemur search engine. Nuggets are substrings in qn, similar to phrases but not necessarily as semantically coherent as phrases. We assume that a valid nugget appears frequently in top returned snippets for qn. In this work the longest sequences of words consisting of frequent bigrams within the top returned snippets are identified as nuggets and are used to formulate a new query. By formulating structured query using the nuggets, we greatly boost the search accuracy than just using qn. We experiment both strict and relaxed forms of structured query formulation. The strict form of query formulation achieves an improvement of 13.5 and the relaxed form achieves an improvement of 17.8 on nDCG at 10 on TREC 2011 query sets. We further combine the nuggets generated from all queries q1, ..., qn-1, qn, to formulate one structured session query for the entire session. Nuggets from each query are weighed by various weighting schemes to indicate their relations to the current query and their potential contributions to the retrieval performance. We experiment three weighting schemes, uniform all queries share the same weight, previous vs. current previous queries q1, ..., qn-1 share the same weight while qn uses a different and higher weight, and distance-based the weights are distributed based on how far a querys position in the session is from the current query. We find that previous vs. current achieves the best search accuracy. For retrieval, we first retrieve a large pool of documents for qn. We then employ a re-ranking model that considers document similarity between clicked documents and documents in the pool as well as dwell time.
APPROVED FOR PUBLIC RELEASE