The Probability of Error on the Design Set as a Function of the Sample Size and Dimensionality.
ROME AIR DEVELOPMENT CENTER GRIFFISS AFB N Y
Pagination or Media Count:
In many pattern classification problems, the classifier must be designed entirely from representative samples which are collected for each class. In these cases, the probability of error on the design set and the classifier itself are random variables since both are functions of the samples. For a number of two-class problems, the expected value of the probability of error on the design set is derived as a function of both the number of samples per class and the number of features. For the continuous case, the underlying class conditional distributions are multivariate normal, and the classifier is a linear discriminant estimated from the samples. For the discrete case, the underlying class conditional distributions are multinomial distributions, and the classifier is based on the estimated bin probabilities for each class. The probability of error on the design set is shown to be an extremely biased estimate of the performance of the minimum probability of error classifier when the ratio of the number of samples per class to the number of features or the number of bins in the discrete case is less than three. Author
- Statistics and Probability