Integration of Clustering with Semantics Learning for Massive Categorical and Mixed Data
Technical Report,16 Aug 2017,15 Aug 2019
Japan Advanced Institute of Science and Technology Nomi Japan
Pagination or Media Count:
The PI has had very good performance with this grant. The main objective of this research was to conduct a systematic study of data-driven similarity measures based on information theory and kernel-based methods for representation of cluster centers for categorical objects so as to ultimately develop a k-means like clustering methodology capable of handling missing data for categorical and mixed datasets. Firstly, the PI has proposed a new unsupervised similarity measure for categorical data based on the information theoretic approach. Secondly, based on the newly developed similarity measure for categorical data, they have proposed a novel k-means like clustering framework making use of kernel-based methods for representation of cluster centers. Thirdly, they also developed the so-called kCCM algorithm for clustering categorical data with missing values. Finally, they have further extended the proposed k -means like clustering framework so as to make it applicable for clustering mixed numeric and categorical datasets with missing data. The PI has had 3 journal papers and 7 conferenceworkshops as a direct result of this research grant. There was one graduate student supported by this research grant.
- Computer Programming and Software