Morphological Degradation Models and their Use in Document Image Restoration
Abstract:
Document images undergo various degradation processes. Numerous models of these degradation processes have been proposed in the literature. In this paper we propose a model-based restoration algorithm. The restoration algorithm first estimates the parameters of a degradation model and then uses the estimated parameters to construct a lookup table for restoring the degraded image. The estimated degradation model is used to estimate the probability of an ideal binary pattern, given the noisy observed pattern. This probability is estimated by degrading noise-free document images and then computing the frequency of corresponding noise-free and noisy pattern pairs. This conditional probability is then used to construct a lookup table to restore the noisy images. The impact of the restoration process is then quantified by computing the decrease in OCR word and character error rate. We find that given the estimated degradation model parameter values, the restoration algorithm decreases the character error rate by 16.1 and the word error rate by 7.35. In some categories of degradation e.g. model parameters that give rise to broken characters there is a 41.5 reduction in character error rate and a 20.4 reduction in word error rate.