Rapid Pre-Indexing by Machine
MASSACHUSETTS INST OF TECH CAMBRIDGE ELECTRONIC SYSTEMS LAB
Pagination or Media Count:
The report describes the development of a new method of subject indexing by machine for documents in the Project INTREX catalog. The purpose of the system is to allow new documents to be placed online quickly in the computer-stored Intres catalog. The system that is developed makes use of human-generated subject terms of existing Intrex documents as a basis for generating index terms for new documents. The pre-indexing system operates on only the title and abstract of a document in generating a pre-index for the document. The analysis of documents already containing human-generated subject indexes consisted of comparing the titles and abstracts of the documents to their subject indexes. A large dictionary with data about word usage was obtained from these comparisons. The dictionary served as a guide for the later pre-indexing of new documents. Three variations of the automatic pre-indexing method were developed, tested, and evaluated. Two methods show promise for operational use in the Intrex system.
- Information Science