BREAKING THE COST BARRIER IN AUTOMATIC CLASSIFICATION
SYSTEM DEVELOPMENT CORP SANTA MONICA CA
Pagination or Media Count:
A low-cost automatic classification method is reported that uses computer time in proportion to NlogN, where N is the number of information items and the base is a parameter. Some barriers besides cost are treated briefly in the opening section, including types of intellectual resistance to the idea of doing classification by content-word similarity. The second section explains the basic processes of document grouping by similarity, and discusses the advantages of the reported method over methods commonly experimented with. The operation of an iterative procedure using word profiles to progressively improve the grouping of content-word lists is described. Then some possible applications aside from document classification are enumerated. The final section begins by presenting theoretical underpinnings that explain the form taken by the components of the method. An account of the struggle to make the method work is sketched, followed by a cycle-by-cycle description of a feasibility demonstration. The conclusion states that mere cheapness is not enough and analyzes what researchers and developers might have to do before user acceptance of automatic classification can be assured.
- Information Science