Interactive Algorithms for Unsupervised Machine Learning
CARNEGIE-MELLON UNIV PITTSBURGH PA SCHOOL OF COMPUTER SCIENCE
Pagination or Media Count:
This thesis explores the power of interactivity in unsupervised machine learning problems. Interactive algorithms employ feedback-driven measurements to reduce data acquisition costs and consequently enable statistical analysis in otherwise intractable settings. Unsupervised learning methods are fundamental tools across a variety of domains, and interactive procedures promise to broaden the scope of statistical analysis. We develop interactive learning algorithms for three unsupervised problems subspace learning, clustering, and tree metric learning. Our theoretical and empirical analysis shows that interactivity can bring both statistical and computational improvements over non-interactive approaches. An over-arching thread of this thesis is that interactive learning is particularly powerful for non-uniform datasets, where non-uniformity is quantified differently in each setting. We first study the subspace learning problem, where the goal is to recover or approximate the principal subspace of a collection of partially observed data points. We propose statistically and computationally appealing interactive algorithms for both the matrix completion problem, where the data points lie on a low dimensional subspace, and the matrix approximation problem, where one must approximate the principal components of a collection of points. We measure uniformity with the notion of incoherence, and we show that our feedback-driven algorithms perform well under much milder incoherence assumptions. We next consider clustering a dataset represented by a partially observed similarity matrix. We propose an interactive procedure for recovering a clustering from a small number of carefully selected similarity measurements. The algorithm exploits non-uniformity of cluster size, using few measurements to recover larger clusters and focusing measurements on the smaller structures.
- Statistics and Probability