Accession Number:

ADA242516

Title:

Advanced Research in Contextual Analysis of Addresses: Phase 3

Descriptive Note:

Draft rept. for period ending Apr 91,

Corporate Author:

ENVIRONMENTAL RESEARCH INST OF MICHIGAN ANN ARBOR

Report Date:

1991-06-01

Pagination or Media Count:

76.0

Abstract:

This report describes the continued development and testing of a system for contextual analysis of machine printed address block images. The system receives a binary image of the address block location of the address block is not a part of this work and then 1 segments the image into lines, words, and characters with multiple hypotheses, 2 assigns class confidence to each character hypothesis using neural networks, 3 locates, reads, and reconciles the city name and ZIP code, 4 parses the address block using keyword recognition, 5 if a PO Box is found, reads the box number and verifies it against the postal directory, otherwise, 6 forms a street name lexicon based on contextual information, including number of street name words, word lengths, recognition of suffix and directionals, and the ZIP code, 7 forms an additional street name lexicon based on partial recognition of the street words, 8 uses word recognition within these lexicons to rank street name hypotheses, 9 retrieves street and range records from a postal directory, 10 matches information from the retrieved records to the fields on the mailpiece forming 9- digit ZIP code hypotheses, 11 applies decision logic to assign the finest supportable depth of sort. In an end-to-end test on data selected for OCR difficulty, using corrected LOS scoring, the system had an encode rate of 50 with 9.5 error and an accept rate of 84 with 9.3 error. This compares favorably with an encode rate of 16.7 with 13.6 error and an accept rate of 61 with 15.5 error achieved by the current MLOCR machine on this same dataset.

Subject Categories:

  • Information Science

Distribution Statement:

APPROVED FOR PUBLIC RELEASE