GENERATION AND ENCODING OF THE PROJECT INTREX AUGMENTED CATALOG DATA BASE
MASSACHUSETTS INST OF TECH CAMBRIDGE ELECTRONIC SYSTEMS LAB
Pagination or Media Count:
A flexible, analytically-structured, catalog-record format was designed to aid in meeting the objectives of the display-oriented Project Intrex augmented catalog experiments. The analytical format, and the catalog data elements and their encoding for machine readability are discussed. The selection of documents from the literature of materials science and engineering for the Intrex data base, the generation of catalog records of those documents, and the initial processing of those records for computer-storage are covered. Initial studies that were made to evaluate the processing of catalog records receive attention. One study shows that data input at an on-line terminal in our current MIT CTSS operating environment is twice as expensive as our normal off-line data input using punched paper tape. Attention is also given to the creation from each document of a set of complete index term phrases and to the problems of matching these unconstrained terms with similarly unconstrained subject request phrases. Computer programs for phrase decomposition and word stemming, and interactive man-machine dialog, will help solve the problems of subject retrieval. The main development phase of the experimental time-shared augmented catalog is nearing completion.
- Information Science
- Computer Hardware
- Computer Systems