Accession Number:

ADA458239

Title:

Document Image Compression and Analysis

Descriptive Note:

Corporate Author:

MARYLAND UNIV COLLEGE PARK INST FOR ADVANCED COMPUTER STUDIES

Personal Author(s):

Report Date:

1997-04-01

Pagination or Media Count:

142.0

Abstract:

Image compression usually considers the minimization of storage space as its main objective. It is desirable, however, to code images so that we have the ability to process the resulting representation directly. In this thesis we explore an approach to document image compression that is efficient in both space storage requirement and time processing flexibility. A representation is presented in which component-level redundancy is removed by forming a prototype library and component location table. This representation forms a basis for compression and provides direct access to image components. To generate the prototype library, a new clustering approach is developed which is suitable for document image components. The distance metric is based on a character degradation model so that degraded versions of the same character will be grouped together. To achieve a lossless representation when required, the residuals are encoded efficiently using a structural distance ordering. OCR is then used as a measure of readability to evaluate the rate distortion tradeoff for lossy compression. A set of algorithms is presented for typical document processing applications which operate effectively on the compressed representation.

Subject Categories:

  • Numerical Mathematics
  • Operations Research
  • Optics

Distribution Statement:

APPROVED FOR PUBLIC RELEASE