OPTIMIZATION AND STANDARDIZATION OF INFORMATION RETRIEVAL LANGUAGE AND SYSTEMS
SPERRY RAND CORP PHILADELPHIA PA UNIVAC DIV
Pagination or Media Count:
The report analyzes and evaluates methods of organizing data files, primarily for document retrieval applications. Three principal techniques are examined the Multi-List System, the list-organized file, and the inverted and document-sequenced file. Statistical analyses were made of term associations based on 599 most common DDC descriptors. Results indicate the need of a large amount of processing against an extensive data base since most documents have almost as many groups as index terms, the postulated reduction in lists traversing a given document cannot be realized. Analysis shows that the list- organized file is an amalgamation of the inverted and document-sequenced files, and that maintenance and use of the two separate files is more efficient when requirements cannot be met by the inverted file alone. A technique for optimizing organization of the two files to minimize actual computing and over- all elapsed processing times is described. It is viewed as dubious that any particular significance can be attached to a unique index term association. There appears potential value in using relationships implicit in the hierarchic structure of a thesaurus, both for processing search requests and to aid in assigning descriptors by such techniques as lowest level indexing.
- Information Science