Accession Number:

AD0281909

Title:

STATISTICAL SEMANTICS,

Descriptive Note:

Corporate Author:

SYSTEM DEVELOPMENT CORP SANTA MONICA CALIF

Personal Author(s):

Report Date:

1962-07-11

Pagination or Media Count:

5.0

Abstract:

Three small libraries in physics, in European current events, and in information retrieval are represented by three groups of 100 lists, each list of which simulates output of a computer program which determines the 12 most frequent content words of a document. Homographs of words which occur in any two of the three libraries are inventoried to ascertain how cleanly the homographs are separated as a consequence of separating the libraries from each other. Three kinds of homograph separation are specified--doubtful, partial, and clean-cut. The latter was found to predominate in this study, as a result of the variegation and small size of the libraries. It is hypothesized that for statistically separable libraries somewhat closer in subject matter andor larger, lower percentages of clean-cut separations should occur, but that there are countertrends which could make these effects less important. Author

Subject Categories:

Distribution Statement:

APPROVED FOR PUBLIC RELEASE