Accession Number:

ADA191446

Title:

Tuning a Major Part of a Clustering Algorithm.

Descriptive Note:

Technical rept.,

Corporate Author:

PRINCETON UNIV NJ DEPT OF STATISTICS

Personal Author(s):

Report Date:

1988-02-01

Pagination or Media Count:

64.0

Abstract:

Most proposals for clustering algorithms have been based on introspection Few proposed algorithms have had their performance studied. This approach involves a striving to avoid comparing distances on remote parts of the data because metrics deserve only minimum trust, and b using a stochastically defined test bed to measure, and where possible understand, the performance of an evolving algorithm, with the intent of using our understanding to modify it in such a way as to improve its performance. This test bed involves 3 circular Gaussian samples, of size 50 each, centered at the vertices of an equilateral triangle of side tsigma. Its use assumes that a 3-group answer is being sought. Thus the only concerned is with a part of the clustering process. Our early algorithms begin to misbehave in the range 5 or t or 7. Our successive steps of improvement work at smaller and smaller t. The last version we have tried still performs usefully median misclassification about 16 at t 2.7, where knowledge of three populations would only let us hold misclassification to a median of 13.3. Comparison with a Gaussian maximum likelihood algorithm on the same set of triple samples shows only slightly better performance than for our algorithm.

Subject Categories:

  • Statistics and Probability

Distribution Statement:

APPROVED FOR PUBLIC RELEASE