Accession Number:



Integration of Clustering with Semantics Learning for Massive Categorical and Mixed Data

Descriptive Note:

Technical Report,16 Aug 2017,15 Aug 2019

Corporate Author:

Japan Advanced Institute of Science and Technology Nomi Japan

Personal Author(s):

Report Date:


Pagination or Media Count:



The PI has had very good performance with this grant. The main objective of this research was to conduct a systematic study of data-driven similarity measures based on information theory and kernel-based methods for representation of cluster centers for categorical objects so as to ultimately develop a k-means like clustering methodology capable of handling missing data for categorical and mixed datasets. Firstly, the PI has proposed a new unsupervised similarity measure for categorical data based on the information theoretic approach. Secondly, based on the newly developed similarity measure for categorical data, they have proposed a novel k-means like clustering framework making use of kernel-based methods for representation of cluster centers. Thirdly, they also developed the so-called kCCM algorithm for clustering categorical data with missing values. Finally, they have further extended the proposed k -means like clustering framework so as to make it applicable for clustering mixed numeric and categorical datasets with missing data. The PI has had 3 journal papers and 7 conferenceworkshops as a direct result of this research grant. There was one graduate student supported by this research grant.

Subject Categories:

  • Computer Programming and Software

Distribution Statement: