Multiple Outliers in Linear Regression: Advances in Detection Methods, Robust Estimation, and Variable Selection

Wisnowski, James Walter

Multiple Outliers in Linear Regression: Advances in Detection Methods, Robust Estimation, and Variable Selection

Active / Technical Report | Accession Number: ADA367633 |

Open PDF

Abstract:

Empirical evidence suggests unusual or outlying observations in data sets are much more prevalent than one might expect 5 to 10 on average for many industries. This research addresses multiple outliers in the linear regression model. Although reliable for a single or a few outliers, standard diagnostic techniques from an ordinary least squares OLS fit can fail to identify multiple outliers. The parameter estimates, diagnostic quantities and model inferences from the contaminated data set can be significantly different from those obtained with the clean data. The researcher requires a dependable method to identify and accommodate these multiple outliers. This research tests both direct methods from algorithms and indirect methods from robust regression estimators to identify multiple outliers. A comprehensive Monte Carlo simulation study evaluates the impact that outlier density and geometry, regressor variable dimension, and outlying distance have on numerous published methods. The performance study focuses on outlier configurations likely to be encountered in practice and uses a designed experiment approach. The results for each scenario provide insight and limitations in performance for each technique. Recommendations are given for each technique. OLS is the optimal regression estimator under a set of assumptions on the distribution of the error term and predictor variables. Compound robust regression estimators have been proposed as alternatives when some OLS assumptions fail. Compound estimators can accommodate multiple outliers and limit the influence of the observations with remote levels of predictor variables. This research proposes a new compound estimator that is more effective for extreme observations in X space and high dimension than currently published methods. This research also addresses the variable selection problem for compound robust regression estimators.

Author(s):

Wisnowski, James Walter

Author Organization(s):

ARIZONA STATE UNIV TEMPE

Descriptive Note:

Dissertation

Pagination:

0267

Security Markings

DOCUMENT & CONTEXTUAL SUMMARY

Distribution:

Approved For Public Release

RECORD

Collection: TR

Identifying Numbers

Report Number(s):

AFIT-FY99-246

Monitor Series:

FY99-246, AFIT

Subject Terms

Joint Capability Areas:

JCA_5_Command and Control; JCA_5.3_Planning; JCA_5.3.3_Develop Strategy; JCA_5.3.4_Develop Courses of Action; JCA_5.3.5_Analyze Courses of Action; JCA_5.6.3_Assess Achievement of Objectives; JCA_5.6_Monitor; JCA_1.2.7_Experimentation; JCA_1_Force Support; JCA_1.2_Force Preparation; JCA_1.2.5_Lessons Learned; JCA_1.2.1_Training; JCA_1.3_Human Capital Management; JCA_1.3.2_Personnel Management; JCA_1.2.3_Educating; JCA_6_Net Centric; JCA_1.2.6_Concepts; JCA_5.4_Decide; JCA_6.1_Information Transport; JCA_8_Building Partnerships

Modernization Areas:

Fully Networked C3

Communities of Interest:

Human Systems

Descriptor(s):

*DATA MANAGEMENT, *LEAST SQUARES METHOD, *LINEAR REGRESSION ANALYSIS, DATA BASES, MATHEMATICAL MODELS, ALGORITHMS, THESES, EIGENVALUES, MONTE CARLO METHOD

Field(s)/Group(s):

Statistics and Probability

Report Date:

1999 May 01

Creation Date:

1999 Sep 22