Nearest Neighbor Classification Using a Density Sensitive Distance Measurement
NAVAL POSTGRADUATE SCHOOL MONTEREY CA MODELING VIRTUAL ENVIRONMENTS AND SIMULATION (MOVES)
Pagination or Media Count:
This work proposes a density sensitive distance measurement that takes into account the density of an underlying dataset to better represent the shape of the data when measuring distance. Kernel density estimation, using kernel bandwidths determined by k-nearest neighbor distances, is used to approximate the density of the underlying dataset. A scale is applied to the resulting kernel density estimate and a line integral is performed along its surface resulting in a density sensitive distance. This work tests the utility of the proposed density sensitive distance measurement using supervised learning. k-Nearest Neighbor classification using both the proposed density sensitive distance measurement and Euclidean distance are compared on the Wisconsin Diagnostic Breast Cancer dataset and the MNIST Database of Handwritten Digits. For perspective, these classifiers are also compared to Support Vector Machine and Random Forests classifiers. Stratified 10-fold cross validation is used to determine the generalization error of each classifier. In all comparisons, k-Nearest Neighbor classification using the proposed density sensitive distance measurement had less generalization error than k-Nearest Neighbor classification using Euclidean distance. For the MNIST dataset, k-Nearest Neighbor classification using the density sensitive distance measurement also had less generalization error than both Support Vector Machine and Random Forests classification.
- Numerical Mathematics
- Theoretical Mathematics
- Computer Programming and Software