Model-Based Clustering and Data Transformations for Gene Expression Data
GEORGE WASHINGTON UNIV WASHINGTON DC DEPT OF STATISTICS
Pagination or Media Count:
Clustering is a useful exploratory technique for the analysis of gene expression data, and many different heuristic clustering algorithms have been proposed in this context. Clustering algorithms based on probability models offer a principled alternative to heuristic algorithms. Model-based clustering assumes that the data is generated by a finite mixture of underlying probability distributions such as multivariate normal distributions. This Gaussian mixture model has been shown to be a power tool for many applications. In addition, the issues of selecting a good clustering method and determining the correct number of clusters are reduced to model selection problems in the probability framework. We benchmarked the performance of model-based clustering on several synthetic and real gene expression data sets for which external evaluation criteria were available. The model-based approach has supeflor performance on our synthetic data sets, consistently selecting the correct model and the right number of clusters.
- Genetic Engineering and Molecular Biology
- Operations Research