Science and Technology Text Mining: Origins of Database Tomography and Multi-Word Phrase Clustering
Technical rept. 1995-2003
OFFICE OF NAVAL RESEARCH ARLINGTON VA
Pagination or Media Count:
This report initially describes the motivations for co-word analysis in support of research policy formulation and research implementation evaluation. It compares co-word analysis in relation to other co-occurrence techniques such as co-citation and co-nomination analyses. It then traces the origins of co-word analysis in computational linguistics, describes in detail the development of co-word analysis for research evaluation, and concludes by presenting a new approach to co-word analysis for research evaluation Database Tomography. The report shows that this new approach to co-word analysis, which requires no index or key words but deals with text directly, is a useful tool for scanning large bodies of text. It can identify pervasive thrust areas and their interrelationships, and serves as a starting point for further in-depth analysis of the text. Its value increases as the size of text increases and the breadth of topical areas covered by the text increases beyond the expertise of a moderate number of expert panels. A single link clustering example is shown that represents the first use of multi-word technical phrases in modern clustering. 75 refs.
- Information Science
- Numerical Mathematics
- Statistics and Probability