UTILITY OF AUTOMATIC CLASSIFICATION SYSTEMS FOR INFORMATION STORAGE AND RETRIEVAL
MOORE SCHOOL OF ELECTRICAL ENGINEERING PHILADELPHIA PA
Pagination or Media Count:
Large-scale, on-line information storage and retrieval systems pose numerous problems above those encountered by smaller systems. A step toward the solution of these problems is presented along with several demonstrations of feasibility and advantages. The methodology on which this solution is based is that of a posteriori automatic classification of the document collection. Feasibility is demonstrated by automatically classifying a file of 50,000 document descriptions. The advantages of automatic classification are demonstrated by establishing methods for measuring the quality of classification systems and applying these measures to a number of different classification strategies. By indexing the 50,000 documents by two independent methods, one manual and one automatic, it is shown that these advantages are not dependent upon the indexing method used. It was found that among those automatic classification algorithms studied, one particular algorithm, CLASFY, consistently outperformed the others.
- Information Science