Accession Number:



NLM at TREC 2012 Medical Records Track

Descriptive Note:

Conference paper

Corporate Author:


Report Date:


Pagination or Media Count:



The NLM team used the relevance judgments for the 2011 Medical Records track that focused on finding patients eligible for clinical studies to analyze the components of our 2011 systems. The analysis showed that the components provided moderate improvements over the baseline established submitting 2011 topics as is to Lucene for some topics and did not harm the results for any other topics. Our experiments confirmed that implementing methods such as negation detection and section splitting motivated by clinical text processing experience could improve identifying patients that meet complex criteria for inclusion in cohort studies. We therefore largely used the 2011 system with minor modifications for document processing. We submitted three automatic runs an Essie baseline run, and two Lucene runs that used the 2011 system with minor modifications. We also submitted an interactive run for which the queries were interactively modified using Essie until either the top ten retrieved documents appeared mostly relevant or no relevant documents could be found. Our interactive queries submitted to Essie significantly outperformed all our other runs and were significantly above the medians for all submission types achieving 0.37 infAP 0.68 infNDCG 0.75 P10 and 0.48 R-prec. Interestingly, the values of the two metrics common for the two years of this track are very close to the values achieved in 2011. The hypothetical overall-best and best-manual performances are significantly better than our interactive run. Our Lucene run that used the topic frames and web-based expansion is significantly better than the Lucene baseline run and the medians on all metrics but P10 for the medians, but it is not significantly better than our other automatic runs. Our other automatic runs are not significantly above the medians. As in 2011, we conclude that the existing search engines are mature enough to support cohort selection tasks, and the quality of the queries could be

Subject Categories:

  • Information Science
  • Medicine and Medical Research

Distribution Statement: