Accession Number:

AD0747987

Title:

The Influence of Selected Factors on Shrinkage and Overfit in Multiple Correlation,

Descriptive Note:

Corporate Author:

NAVAL AEROSPACE MEDICAL INST PENSACOLA FLA

Personal Author(s):

Report Date:

1971-09-07

Pagination or Media Count:

87.0

Abstract:

Weighting of variables in a regression equation so as to maximize prediction of a criterion presents several problems. Optimal weighting in the sample case means that chance-related error is also weighted indiscriminately. Because such error will not relate to the criterion in subsequent samples, a sample multiple correlation R will be on the average larger than the population value overfit, and its value on cross-validation will be lower than in the quation-development sample shrinkage. The influence of characteristics of the population and other conditions of the sampling situation on the outcome and stability of the regression equation has not been well understood. In particular, the role played by the relationship of initial predictor set size M to sample size N has not received adequate attention. The report attempted to examine and isolate the role of sampling error in the magnitude and stability of sample multiple R values obtained by incremental test selection techniques. The effect of selected factors on the impact of sampling error was examined. Three proposed shrinkage estimation formulas were evaluated for effectiveness, and a search was conducted for more efficient formulas incorporating the MN ratio. Method of controlling shrinkage and overfit were discussed and evaluated. Author

Subject Categories:

  • Statistics and Probability

Distribution Statement:

APPROVED FOR PUBLIC RELEASE