PRIS at 2012 TREC Medical Track: Query Expansion, Retrieval and Ranking
BEIJING UNIV OF POSTS AND TELECOMMUNICATIONS (CHINA)
Pagination or Media Count:
The official datasets are XML format so we have to parse them before indexing. We choose Lucene as our tool for indexing and searching, we select the Jakarta-commons-Digester the following we referred to as digester to parse the xml documents. The xml document is processed by the Digester to be a java object and then we can get the fields that we would use from the java object. In addition, we also process the tag reporttext in the xml documents so that we can get the age and sexuality information which are very important fields for searching task.
- Information Science