Accession Number:

ADA458738

Title:

The Analysis of a Simple k-Means Clustering Algorithm

Descriptive Note:

Technical rept.

Corporate Author:

MARYLAND UNIV COLLEGE PARK DEPT OF COMPUTER SCIENCE

Report Date:

2000-01-01

Pagination or Media Count:

21.0

Abstract:

K-means clustering is a very popular clustering technique which is used in numerous applications. Given a set of n data points in Rexp d and an integer k, the problem is to determine a set of k points Rexp d, called centers, so as to minimize the mean squared distance from each data point to its nearest center. A popular heuristic for k-means clustering is Lloyds algorithm. In this paper, we present a simple and efficient implementation of Lloyds k-means clustering algorithm, which we call the filtering algorithm. This algorithm is very easy to implement. It differs from most other approaches in that it precomputes a kd-tree data structure for the data points rather than the center points. We establish the practical efficiency of the filtering algorithm in two ways. First, we present a data-sensitive analysis of the algorithms running time. Second, we have implemented the algorithm and performed a number of empirical studies, both on synthetically generated data and on real data from applications in color quantization, compression, and segmentation.

Subject Categories:

  • Numerical Mathematics
  • Cybernetics

Distribution Statement:

APPROVED FOR PUBLIC RELEASE