Title :   Principal Curves and Surfaces

Descriptive Note : Technical rept.

Corporate Author : STANFORD UNIV CA LAB FOR COMPUTATIONALSTATISTICS

Personal Author(s) : Hastie, Trevor

Report Date : Nov 1984

Pagination or Media Count : 107

Abstract : Principal curves are smooth one dimensional curves that pass through the middle of a p dimensional data set. They minimize the distance from the points, and provide a non-linear summary of the data. The curves are non- parametric and their shape is suggested by the data. Similarly, principal surfaces are two dimensional surfaces that pass through the middle of the data. The curves and surfaces are found using an iterative procedure which starts with a liner summary such as the usual principal component line or plate. Each successive iteration is a smooth or local average of the p dimensional points, where local is based on the projections of the points onto the curve or surface of the previous iteration. A number of linear techniques, such as factor analysis and errors in variables regression, end up using the principal components as their estimates (after a suitable scaling of the co-ordinates). Principal curves and surfaces can be viewed as the estimates of non-linear generalizations of these procedures. Principal Curves (or surfaces) have a theortical definition for distributions: they are the Self Consistent curves. A curve is self consistent if each point on the curve is the conditional mean of the points that project there. The main theorem proves that principal curves are critical values of the expected squared distance between the points and the curve. Linear principal components have this property as well; in fact, we prove that if a principal curve is straight, then it is a principal component. These results generalize the usual duality between conditional expectation and distance minimization. We also examine two sources of bias in the procedures, which have the satisfactory property of partially cancelling each other.

Descriptors :   *NONPARAMETRIC STATISTICS , *INFORMATION THEORY , *DISTRIBUTION CURVES , LINEAR SYSTEMS , METHODOLOGY , GRAPHS , ESTIMATES , SURFACES , CURVATURE , NONLINEAR SYSTEMS , VALUE , BIAS , ITERATIONS , FACTOR ANALYSIS

Subject Categories : Statistics and Probability
Cybernetics

Distribution Statement : APPROVED FOR PUBLIC RELEASE