RESEARCH ON AUTOMATIC CLASSIFICATION, INDEXING AND EXTRACTING
Annual progress rept.
IBM FEDERAL SYSTEMS DIV GAITHERSBURG MD
Pagination or Media Count:
To contribute to the success of several studies for automatic classification, indexing and extracting currently in progress, as well as to further our theoretical and practical understanding of textual item distributions, this years funds under Contract No. Nonr 445600 have been applied to the development of frequency program capable of supplying these types of information. The program planned for the System360, will provide numerous user options covering the format of the input text, the definition of a countable item e.g., a work may be specified as any string of characters between delimiters such as comma, space, period, or any combination thereof, the definition of a textual unit over which frequencies are to be subtotaled e. g., sentence, paragraph, or document, the types of data to be output, and the machine configuration to be used. Also, facility will be provided for the incorporation of user-supplied routines to perform special functions such as word pair generation, suffix normalization, etc. Progress has been made on the design of the Dictionary Build module of the frequency program. The main purpose of the program is the provision of an output containing an ordered list of the items, their frequencies, and any special tags desired by the user.
- Information Science