A Method for the Removal of Redundancy in Printed Text.
ILLINOIS UNIV URBANA COORDINATED SCIENCE LAB
Pagination or Media Count:
A class of methods for redundancy removal from printed texts, called ID-methods was developed. ID-methods take into account only the statistics associated with word occurrences in printed text. However, it has been shown by means of models that these methods can be used to encode English text at a cost as low as 1.5 binary digits per character. This figure compares favorably with Shannons upper bound on the entropy of printed English, which was determined by an experiment that implicitly took into account the syntactic structure and the semantics of English. Shannons bound was 1.3 bit per character. An encoding experiment was performed, which verified the cost predictions and assessed the complexity of using ID-methods. It was found that text could be encoded at a rate that was on the order of a few thousand characters per second. An analysis indicates that text encoded using an ID-method could be decoded at a rate of 250,000 characters per second on a computer such as the IBM 36075. Author