Determining the Number of Subpopulations.
Technical rept. 1 Sep 84-1 Oct 86,
MASSACHUSETTS INST OF TECH CAMBRIDGE STATISTICS CENTER
Pagination or Media Count:
The aim of cluster analysis is to find groups of similar objects. An important problem in clustering is finding the number of clusters. This is a statistical inference problem if the objects to be clustered are sampled from an underlying population. This thesis addresses the problem of inferring from a data sample the location and the number of subpopulations in the underlying population. This will be accomplished by introducing a new measure of the degree of multimodality of a density f, and then using the sample value of this measure as a basis for determining the number of modes, or subpopulations, when sampling from f. It is assumed that each subpopulation of the population corresponds to a mode of the underlying density f. Each population cluster is then characterized as a modal region, a high-density region surrounded by low-density regions. Two methods of characterizing these modes are considered. The first measures a relative distance, or how far apart two modes are from one another. The second measures how flat, or how close to uniform, the corresponding modal regions are. By combining these two concepts into a single parameter, a measure of the degree of multimodality of a density f is developed. In general, for a density with two or more modes, the degree of multimodality will be a measure of how far from the other modes and how close to uniform the weakest and flattest mode is.
- Statistics and Probability
- Active and Passive Radar Detection and Equipment