Learning Distance Functions for Exemplar-Based Object Recognition
CALIFORNIA UNIV BERKELEY DEPT OF ELECTRICAL ENGINEERING AND COMPUTER SCIENCES
Pagination or Media Count:
This thesis investigates an exemplar-based approach to object recognition that learns, on an image-by-image basis, relative importance of patch-based features for determining similarity. We represent images as sets of patch-based features. To find the distance between two images, we first find for each patch its nearest patch in the other image and compute their inter-patch distance. The weighted sum of these inter-patch distances is defined to be the distance between the two images. Main contribution of this thesis is a method for learning a set-to-set distance function specific to each training image and demonstrating the use of these functions for image browsing, retrieval and classification. Goal of the learning algorithm is to assign a non-negative weight to each patch-based feature of the image such that the most useful patches are assigned large weights and irrelevant or confounding patches are given zero weights. We formulate this as a large-margin optimization and discuss two versions a focal version that learns weights for each image separately, and a global version that jointly learns the weights for all training images. In the focal version, the distance functions learned for the training images are not directly comparable to one another and can be most directly applied to in-sample applications such as image browsing, though with heuristics or additional learning, these functions can be used for image retrieval or classification. The global approach, however, learns distance functions that are globally consistent and can be used directly for image retrieval and classification. Using geometric blur and simple color features, we show that both versions perform as well or better than algorithms on the Caltech 101 object recognition benchmark. The global version achieves the best results, a 63.2 mean recognition rate when trained with fifteen images per category and 66.6 when trained with twenty.
- Operations Research