# Accession Number:

## AD1069713

# Title:

## Probabilistic Programming with Missing Data

# Descriptive Note:

## Technical Report

# Corporate Author:

## Massachusetts Institute of Technology Cambridge United States

# Personal Author(s):

# Report Date:

## 2019-02-26

# Pagination or Media Count:

## 53.0

# Abstract:

This report summarizes work performed to develop a computer algorithm that is capable of handling missing data FIelds in multivariate data sets. The results presented here are based upon prior work which examined the applicability of inverse covariance matrices, or precision matrices,to representing missing data as zero eigenvalues in the precision matrices. The prior work used maximum a posteriori MAP estimates for a combination of normally distributed multivariate data with normally distributed multivariate measurement errors and assumed that the prior probability distributions for means and precision matrices were uniform. This work extends the previous technique to one that uses normal-Wishart matrices to describe the prior probability distribution for a normal data distribution and to estimate posterior parameters for these distributions for multivariate data with missing fields. While the integrals to estimate posterior probabilities from likelihood and prior probability distributions may in fact be analytically solvable, the authors were unable to discover such a solution. Instead, analytic integral solutions were used for individual probability measurements and a probabilistic programming language was used to perform numerical integration on the remaining integrals. The chosen solution leverages the strengths of one integration method to address the weakness of the complimentary method analytic integration is performed where numerical integration cannot be performed, and numerical integration is performed where analytic solutions are currently unknown. Most of the recent work was performed by an undergraduate intern for Group 104 at MIT Lincoln Laboratory during the Summer of 2018. A model for the problem is defIned and analyzed mathematically. A discussion of the probabilistic programming languages and programs is also provided, along with results for a number of simulations.

# Descriptors:

# Subject Categories:

- Statistics and Probability