SEARCH PROCEDURES BASED ON MEASURES OF RELATEDNESS BETWEEN DOCUMENTS
MASSACHUSETTS INST OF TECH CAMBRIDGE PROJECT MAC
Pagination or Media Count:
A new type of information retrieval system is suggested which utilizes data of the type generated by users of the system instead of data generated by indexers. The theoretical model on which the system is based consists of three basic elements. The first element is a measure of the relatedness between document-pairs. It is derived from information theory. The second element is a definition of what constitutes a set cluster of inter- related documents. This definition is based on the measure of relatedness. The last element is a procedure which transforms a request for information into a cluster of answer documents. An experimental system was developed to test the model in a realistic environment. It was programmed for the Project MAC time- sharing system and utilized the physics data file of the Technical Information Project. Citations were used as the data base for the measure of relatedness. A file structure and retrieval language were designed which allowed close man- machine coupling. Retrieval efficiency compared to known sets was 60 - 90 percent, and ways of improving this further are suggested.
- Information Science