When are Overcomplete Representations Identifiable? Uniqueness of Tensor Decompositions Under Expansion Constraints
CALIFORNIA UNIV IRVINE
Pagination or Media Count:
Overcomplete latent representations have been very popular for unsupervised feature learning in recent years. In this paper, we specify which overcomplete models can be identified given observable moments of a certain order. We consider probabilistic admixture or topic models in the overcomplete regime. While general overcomplete admixtures are not identifiable, we establish generic identifiability under a constraint, referred to as topic persistence. Our sufficient conditions for identifiability involve a novel set of expansion conditions on the population structure i.e. the topic-word matrix of the persistent topic model. Specifically, we require the existence of a perfect matching from hidden variables to higher order observed variables, and can thus, incorporate overcomplete models. In particular, we establish that random models are identifiable w.h.p. in the overcomplete regime. Moreover, our analysis allows for general nondegenerate distributions for modeling the topic proportions. Our proof techniques incorporate a novel class of tensor decompositions which fall in between the well-known candecompparafac CP and Tucker decompositions, and provide novel conditions for unique tensor decomposition.