Adaptive Hindi OCR Using Generalized Hausdorff Image Comparison
MARYLAND UNIV COLLEGE PARK INST FOR ADVANCED COMPUTER STUDIES
Pagination or Media Count:
In this paper, we present an adaptive Hindi OCR using generalized Hausdor image comparison implemented as part of a rapidly retargetable language tool report. The system includes script identification, character segmentation, training sample creation and character recognition. The OCR design completed in one month was applied to a complete Hindi-English bilingual dictionary with 1083 pages and a collection of ideal images extracted from Hindi documents in PDF format. Experimental results show the recognition accuracy can reach 88 for noisy images and 95 for ideal images, both at the character level. The presented method can also be extended to design OCR systems for different scripts.