A Point Matching Algorithm for Automatic Groundtruth Generation
MARYLAND UNIV COLLEGE PARK CENTER FOR AUTOMATION RESEARCH
Pagination or Media Count:
Geometric groundtruth at the character, word, and line levels is crucial for developing and evaluating optical character recognition OCR algorithms. Kanungo and Haralick proposed a closed-loop methodology for generating character-level groundtruth for rescanned images. In this paper, we present a robust version of their methodology. We grouped the feature points and used a feature point registration algorithm on the grouped feature point set to estimate the transformation. The Euclidean distance between character centroids was used as the error metric. We performed experiments on the University of Washington data set.
- Numerical Mathematics