Hybrid SIS and Markov Chain Monte Carlo Sampling Methodology for Goodness-of-Fit Tests on Contingency Tables
Naval Postgraduate School Monterey United States
Pagination or Media Count:
Logistic regression is one of the most popular means of modeling contingency table data due to its ease of use. Simple asymptotic inference like a X2 approximation for evaluating goodness-of-fit tests, however, may not be valid for sparse datasets having cell counts less than 5. In these cases, we often attempt exact conditional inference via a sampler, such as Markov Chain Monte Carlo MCMC or Sequential Importance Sampling SIS. This paper proposes a hybrid sampling scheme that combines MCMC and SIS to sample sparse, multidimensional contingency tables satisfying fixed marginals when MCMC alone does not guarantee an exhaustive sampling of the conditional state space. To investigate its suitability, the proposed hybrid scheme is applied to an observational dataset from Alzheimers researcher JA Mortimer measuring the cognitive states of nuns over a 15 year period beginning in 1991. Through the application of our proposed scheme, we find the estimated p-values via a hybrid MCMC and SIS sampler are remarkably similar to the X2 asymptotic approximation p-values, even for sparse contingency tables.
- Statistics and Probability