Accession Number:



Text Mining the Biomedical Literature

Descriptive Note:

Research rept.

Corporate Author:


Personal Author(s):

Report Date:


Pagination or Media Count:



Text mining of the biomedical literature provides patterns of relationships among concepts, people, and institutions, offering enhanced medicaltechnical intelligence unobtainable by other means. This report describes myriad text mining capabilities. Section 1 covers biomedical knowledge management, the role of text mining in knowledge management, and describes the cultural changes and global agreements required to allow the full power and capabilities of text mining to be utilized. The next two sections address information retrieval issues. Section 2 describes the extraction of useful information from the published biomedical literature. Section 3 describes the information content in different record fields in a major medical database. The next four sections address computational linguistics issues, especially related to identifying patterns and relationships in text. Section 4 outlines a family of methods for generating radical biomedical discovery from the literature. Section 5 shows how increasing specialization within the biomedical community creates roadblocks for the acceleration of radical discovery, and recommends ways to eliminate these roadblocks. Section 6 describes the detection of unexpected asymmetries from the biomedical literature, with a specific example on bilateral organ cancer incidence asymmetry detection. Section 7 describes a unique approach for removing wordsphrases of low technical content and improving the quality of the resulting technical taxonomies. Section 8 describes the use and misuse of citation analysis in biomedical text mining. Section 9 describes citation mining. Section 10 describes the use of citation analysis to evaluate the quality of research performers. Section 11 shows a systematic approach for defining the seminal literature of any biomedical topic. Sections 12 and 13 describe the differences between highly and poorly cited biomedical articles, with specific case studies from leading medical journals.

Subject Categories:

  • Information Science
  • Linguistics
  • Medicine and Medical Research
  • Cybernetics

Distribution Statement: