Principal Curves and Surfaces
STANFORD UNIV CA LAB FOR COMPUTATIONALSTATISTICS
Pagination or Media Count:
Principal curves are smooth one dimensional curves that pass through the middle of a p dimensional data set. They minimize the distance from the points, and provide a non-linear summary of the data. The curves are non- parametric and their shape is suggested by the data. Similarly, principal surfaces are two dimensional surfaces that pass through the middle of the data. The curves and surfaces are found using an iterative procedure which starts with a liner summary such as the usual principal component line or plate. Each successive iteration is a smooth or local average of the p dimensional points, where local is based on the projections of the points onto the curve or surface of the previous iteration. A number of linear techniques, such as factor analysis and errors in variables regression, end up using the principal components as their estimates after a suitable scaling of the co-ordinates. Principal curves and surfaces can be viewed as the estimates of non-linear generalizations of these procedures. Principal Curves or surfaces have a theortical definition for distributions they are the Self Consistent curves. A curve is self consistent if each point on the curve is the conditional mean of the points that project there. The main theorem proves that principal curves are critical values of the expected squared distance between the points and the curve. Linear principal components have this property as well in fact, we prove that if a principal curve is straight, then it is a principal component. These results generalize the usual duality between conditional expectation and distance minimization. We also examine two sources of bias in the procedures, which have the satisfactory property of partially cancelling each other.
- Statistics and Probability