Accession Number:

ADA512724

Title:

THUIR at TREC2008: Blog Track

Descriptive Note:

Conference paper

Corporate Author:

TSINGHUA UNIV BEIJING (CHINA) NATIONAL LAB FOR INFORMATION SCIENCE AND TECHNOLOGY

Report Date:

2008-11-01

Pagination or Media Count:

6.0

Abstract:

This is the second year that the IR groups of Tsinghua University participated in TREC Blog Track. Different from the previous track, TREC introduced a new task, the polarity finding task. So, we focus on 3 main tasks this year. The opinion retrieval task involves locating blog posts that express an opinion about a given target. The target can be a traditional named entity -- a name of a person, location, or organization -- but also a concept such as a type of technology, a product name, or an event. The topic of the post does not necessarily have to be the target, but an opinion about the target must be present in the post or one of the comments to the post. The polarity task is to locate blog posts that express an idea either positive or negative about a target. For relevant task, a multi-field relevance ranking based on probabilistic retrieval model has been used. Both feed content and permalink content are used. Two kinds of information fusion have been experimented. One is the result combination on both parts. Another is to combine the two corpus in the weighting phase with improved algorithms. Experimental results on training set showed that both methods are proved to be effective and the second way seemed to be more stable. For opinion finding tasks, the combination of relevance score and opinionate score use a unified generation model is emphasized. The final score of one document is a quadratic combination of sentiment score given by an opinion generation model and the relevance score given by document generation model. HowNet has been used as the sentimental lexicon. For polarity task, several algorithms on using sentiment words co-occurrence frequency are implemented. The selection of the sentiment dictionaries and the effectiveness of co-occurrence window size are studied. The approach of using polarity words as query terms on first-step relevance results is also performed.

Subject Categories:

  • Information Science

Distribution Statement:

APPROVED FOR PUBLIC RELEASE