Latent Variable Graphical Modeling for High Dimensional Data Analysis
Technical Report,15 Jun 2016,14 Jun 2019
California Institute of Technology Pasadena United States
Pagination or Media Count:
An outstanding challenge in many applications throughout science and engineering is to succinctly characterize the relationships among a large number of interacting entities. For example, in a computational biology setting a typical question involving gene regulatory networks is to discover the interaction patterns among a collection of genes in order to better understand their biological function. Similar problems also arise in analyzing word frequencies in a large corpus of text documents, in community detection in social networks, in the analysis of networks of water reservoirs in the geosciences, and in competitive interaction problems in economics. To address these challenges in a unified manner, this proposal originally aimed at developing new methodology via statistical models defined on graphs, as graphs often provide a concise representation of the interactions among a large set of variables. Over the course of the past three years, we developed new algorithmic frameworks based on convex optimization for tasks such as associating semantics to latent variables, evaluating statistical confidence of latent variable model selection methods, finding hidden structured subgraphs inside larger networks, obtaining bounds on deviations between models specified by two networks, and fitting convex shapes to tomographic data. We demonstrate the applicability of our new methodology in domains such as reservoir modeling of the California network, hyperspectral imaging, recommender systems, and comparing molecular structure in chemistry problems.
- Statistics and Probability
- Operations Research