Accession Number:

ADA416268

Title:

Science and Technology Text Mining: Origins of Database Tomography and Multi-Word Phrase Clustering

Descriptive Note:

Technical rept. 1995-2003

Corporate Author:

OFFICE OF NAVAL RESEARCH ARLINGTON VA

Personal Author(s):

Report Date:

2003-08-15

Pagination or Media Count:

87.0

Abstract:

This report initially describes the motivations for co-word analysis in support of research policy formulation and research implementation evaluation. It compares co-word analysis in relation to other co-occurrence techniques such as co-citation and co-nomination analyses. It then traces the origins of co-word analysis in computational linguistics, describes in detail the development of co-word analysis for research evaluation, and concludes by presenting a new approach to co-word analysis for research evaluation Database Tomography. The report shows that this new approach to co-word analysis, which requires no index or key words but deals with text directly, is a useful tool for scanning large bodies of text. It can identify pervasive thrust areas and their interrelationships, and serves as a starting point for further in-depth analysis of the text. Its value increases as the size of text increases and the breadth of topical areas covered by the text increases beyond the expertise of a moderate number of expert panels. A single link clustering example is shown that represents the first use of multi-word technical phrases in modern clustering. 75 refs.

Subject Categories:

  • Information Science
  • Numerical Mathematics
  • Statistics and Probability

Distribution Statement:

APPROVED FOR PUBLIC RELEASE