Hyperdimensional Data Analysis and Structural Inference
Final technical rept. 1 Apr 1987-31 Mar 1989
GEORGE MASON UNIV FAIRFAX VA CENTER FOR COMPUTATIONAL STATISTICS
Pagination or Media Count:
This research project was based on belief that modern technology has substantially changed the flavor of problems being presented to the statistician. Electronic instrumentation implies an ability to acquire a large amount of high dimensional data very rapidly. While such capabilities have existed for some time, the emergence of cheap RAM in the 1980s has given us the ability to store and access that data in an active computer memory. This represents a challenge for statisticians which is substantially different in kind. The majority of existing methodology is focused on the univariate, iid random variable model. Even in the circumstance that a multivariate model is allowed, it is usually assumed to be multivariate normal. While arbitrary sample size is frequently assumed, the truth of the matter is that these techniques implicitly assume small to moderate sample sizes. For example, a regression problem with 5 design variables and 1000 observations would represent no problem for traditional techniques. By contrast, a regression problem with 40,000 design variables and 8 million observations would. The reason is clear. In the former case the emphasis is on statistical efficiency which is the operational goal for most current statistical technology. By contrast, in the latter case, emphasis must be clearly on computational efficiency. The emphasis on parsimony in many contemporary books and papers is a further reflection of the mind-set that implicitly focuses on small to moderate sample sizes since few parameters do not make sense in the context of very large sample sizes. Finally, we note that the very fact of largeness in sample size implies that it is unlikely we would see iid homogeneity.
- Information Science
- Statistics and Probability