Accession Number:

ADA581520

Title:

Ensemble Clustering for Result Diversification

Descriptive Note:

Conference paper

Corporate Author:

TWENTE UNIV ENSCHEDE (NETHERLANDS)

Personal Author(s):

Report Date:

2012-11-01

Pagination or Media Count:

5.0

Abstract:

This paper describes the participation of the University of Twente in the Web track of TREC 2012. Our baseline approach uses the Mirex toolkit, an open source tool that sequentially scans all the documents. For result diversification we experimented with improving the quality of clusters through ensemble clustering. We combined clusters obtained by different clustering methods such as LDA and K-means and clusters obtained by using different types of data such as document text and anchor text. Our two-layer ensemble run performed better than the LDA based diversification and also better than a non-diversification run.

Subject Categories:

  • Information Science

Distribution Statement:

APPROVED FOR PUBLIC RELEASE