Accession Number:

ADA517717

Title:

Strategies for Effective Chemical Information Retrieval

Descriptive Note:

Conference paper

Corporate Author:

PURDUE UNIV LAFAYETTE IN DEPT OF COMPUTER SCIENCES

Personal Author(s):

Report Date:

2009-11-01

Pagination or Media Count:

7.0

Abstract:

We participated in the technology survey and prior art search subtasks of the TREC 2009 Chemical IR Track. This paper describes the methods developed for these two tasks. For the technology survey task, we propose a method that constructs highly structured queries to do retrieval on different fields of chemical patents and documents in a weighted way. The proposed method i enriches these structured queries with synonyms of the chemicals that have been identified, and ii uses simple entity recognition to extract information for increasing or decreasing weights of some terms and to filter out documents from the ranked list. For prior art search task we propose an automated query generation method that uses all title words, and selects sets of terms from the claims, abstract and description fields of query patents to transform a query patent into a search query. From the selected terms, chemical entities are extracted and synonyms for the identified chemical entities are included from PubChem. Then structured queries are formed to do retrieval over different fields of documents with different weights. Furthermore a post-processing step is also proposed that i filters out some of the retrieved documents from the ranked list because of date constraints and ii utilizes the IPC similarities between query patent and its retrieved patents to re-rank the retrieved documents. Empirical results demonstrate the effectiveness of these methods in both tasks.

Subject Categories:

  • Information Science
  • Computer Programming and Software
  • Test Facilities, Equipment and Methods
  • Sociology and Law
  • Industrial Chemistry and Chemical Processing

Distribution Statement:

APPROVED FOR PUBLIC RELEASE