The Use of Selected Portions of Technical Documents as Sources of Index Terms and Effect on Input Costs and Retrieval Effectiveness.
Final summary rept. 1 Dec 71-30 Nov 72,
DAYTON UNIV OHIO RESEARCH INST
Pagination or Media Count:
Recall the retrieval of all available relevant documents should decrease with the quantity of text serving as a source of indexing. However, the time for indexing and therefore the input cost should be less, establishing a tradeoff between input cost and retrieval effectiveness. To quantify the effect of restricting the source text on both retrieval effectiveness and input cost, an experiment was designed in which the full technical document text was divided into five categories title abstract table of contents and lists of figures and tables author-assigned keywords and the body. An experimental data base was prepared whereby the index term source category and the indexing time were recorded. Sets of SDI and retrospective searches were run against the data base, and retrievals were analyzed by category. For the subset of documents retrieved, 81 of the available relevant documents were retrieved from Categories 1-4 the indexing time required for these four categories was only 53 of the total indexing time. For the entire set of documents input into the experimental data base, the portion of indexing time for the first four categories was 60. It was decided that the body of the document could be excluded as a source of index terms. Modified author abstract
- Information Science