DID YOU KNOW? DTIC has over 3.5 million final reports on DoD funded research, development, test, and evaluation activities available to our registered users. Click
HERE to register or log in.
Accession Number:
ADA191446
Title:
Tuning a Major Part of a Clustering Algorithm.
Descriptive Note:
Technical rept.,
Corporate Author:
PRINCETON UNIV NJ DEPT OF STATISTICS
Report Date:
1988-02-01
Pagination or Media Count:
64.0
Abstract:
Most proposals for clustering algorithms have been based on introspection Few proposed algorithms have had their performance studied. This approach involves a striving to avoid comparing distances on remote parts of the data because metrics deserve only minimum trust, and b using a stochastically defined test bed to measure, and where possible understand, the performance of an evolving algorithm, with the intent of using our understanding to modify it in such a way as to improve its performance. This test bed involves 3 circular Gaussian samples, of size 50 each, centered at the vertices of an equilateral triangle of side tsigma. Its use assumes that a 3-group answer is being sought. Thus the only concerned is with a part of the clustering process. Our early algorithms begin to misbehave in the range 5 or t or 7. Our successive steps of improvement work at smaller and smaller t. The last version we have tried still performs usefully median misclassification about 16 at t 2.7, where knowledge of three populations would only let us hold misclassification to a median of 13.3. Comparison with a Gaussian maximum likelihood algorithm on the same set of triple samples shows only slightly better performance than for our algorithm.
Distribution Statement:
APPROVED FOR PUBLIC RELEASE