Scalable Topic Modeling: Online Learning, Diagnostics, and Recommendation
COLUMBIA UNIV NEW YORK NEW YORK United States
Pagination or Media Count:
The main activity of my research group is to build and develop the probabilistic pipeline. When solving problems with data, we take the following steps. 1. We make assumptions about our data, embedding it in a probability model containing hidden and observed random variables. 2. Given observations, we use inference algorithms to estimate the conditional distribution of the hidden variables. This is the central statistical and computational problem. 3. With the results of inference, we use our model to form predictions about the future, explore the data, or otherwise apply what we learned to solve a problem. 4. We criticize our model, understand where it went right and wrong, and repeat the process to revise it.