Accession Number:

AD1028715

Title:

Scalable Topic Modeling: Online Learning, Diagnostics, and Recommendation

Descriptive Note:

Technical Report

Corporate Author:

COLUMBIA UNIV NEW YORK NEW YORK United States

Personal Author(s):

Report Date:

2017-03-01

Pagination or Media Count:

5.0

Abstract:

The main activity of my research group is to build and develop the probabilistic pipeline. When solving problems with data, we take the following steps. 1. We make assumptions about our data, embedding it in a probability model containing hidden and observed random variables. 2. Given observations, we use inference algorithms to estimate the conditional distribution of the hidden variables. This is the central statistical and computational problem. 3. With the results of inference, we use our model to form predictions about the future, explore the data, or otherwise apply what we learned to solve a problem. 4. We criticize our model, understand where it went right and wrong, and repeat the process to revise it.

Subject Categories:

Distribution Statement:

APPROVED FOR PUBLIC RELEASE