Re-Construction of Reference Population and Generating Weights by Decision Tree
DEFENSE EQUAL OPPORTUNITY MANAGEMENT INST PATRICK AFB FL United States
Pagination or Media Count:
The DEOCS received responder data, which does not contain non-responses, directly through the survey as well as unit population data through DMDC. To estimate statistical characteristic of the population, the DEOCS team has merged the unit population data into survey data, which is a dataset of 260,000 cases. However, the non-responses rate is more than 60, so the responder data may not be representative of population. In order to compensate for non-responses, weighting is needed to avoid bias. In order for computing post-stratification weights, the first step is to design and realize an algorithm by Python to re-construct the population. The second step is to compute weights. The last step is to weight response cases and analyze. Two methods were adopted in the process of computing weights. The first weighting method is to compute post-stratification weights from crosstabs. This method is used to compute two types of weights. The type 1 is weighting with respect to unit reference population. The type 2 is weighting with respect to the whole reference population. The second method is to use Logistic Regression approach to compute weights. SPSS decision tree with CHAID module has been used to compute the probabilities of predict factors for Logistic Regression. In the end, we compare the effects of the weights from these two different methods on distribution of variables.
- Computer Programming and Software
- Statistics and Probability