Accession Number : ADA624286


Title :   Interactive Algorithms for Unsupervised Machine Learning


Descriptive Note : Doctoral thesis


Corporate Author : CARNEGIE-MELLON UNIV PITTSBURGH PA SCHOOL OF COMPUTER SCIENCE


Personal Author(s) : Krishnamurthy, Akshay


Full Text : https://apps.dtic.mil/dtic/tr/fulltext/u2/a624286.pdf


Report Date : Jun 2015


Pagination or Media Count : 149


Abstract : This thesis explores the power of interactivity in unsupervised machine learning problems. Interactive algorithms employ feedback-driven measurements to reduce data acquisition costs and consequently enable statistical analysis in otherwise intractable settings. Unsupervised learning methods are fundamental tools across a variety of domains, and interactive procedures promise to broaden the scope of statistical analysis. We develop interactive learning algorithms for three unsupervised problems: subspace learning, clustering, and tree metric learning. Our theoretical and empirical analysis shows that interactivity can bring both statistical and computational improvements over non-interactive approaches. An over-arching thread of this thesis is that interactive learning is particularly powerful for non-uniform datasets, where non-uniformity is quantified differently in each setting. We first study the subspace learning problem, where the goal is to recover or approximate the principal subspace of a collection of partially observed data points. We propose statistically and computationally appealing interactive algorithms for both the matrix completion problem, where the data points lie on a low dimensional subspace, and the matrix approximation problem, where one must approximate the principal components of a collection of points. We measure uniformity with the notion of incoherence, and we show that our feedback-driven algorithms perform well under much milder incoherence assumptions. We next consider clustering a dataset represented by a partially observed similarity matrix. We propose an interactive procedure for recovering a clustering from a small number of carefully selected similarity measurements. The algorithm exploits non-uniformity of cluster size, using few measurements to recover larger clusters and focusing measurements on the smaller structures.


Descriptors :   *LEARNING MACHINES , ALGORITHMS , APPROXIMATION(MATHEMATICS) , CLUSTERING , DATA ACQUISITION , EXPERIMENTAL DATA , MATRICES(MATHEMATICS) , NETWORK ARCHITECTURE , PROBABILITY , RELATIONAL DATA BASES , SAMPLING , SIMULATION , STATISTICAL ANALYSIS , STATISTICAL INFERENCE , TENSOR ANALYSIS , THESES , TOMOGRAPHY


Subject Categories : Statistics and Probability
      Cybernetics


Distribution Statement : APPROVED FOR PUBLIC RELEASE