DOCUMENT FORMAT RECOGNITION.
SYLVANIA ELECTRIC PRODUCTS INC WALTHAM MASS
Pagination or Media Count:
This study is primarily concerned with methods for analyzing the format of pages from technical journals, and means for automatically processing the textual and graphic material on these pages for input to a computer which is to perform textual data processing functions, such as automatic language translation, automatic abstracting, automatic indexing, etc. This analysis and processing includes text-graphic separation, location of graphics, and textual analysis and recognition. The overall process is considered to be a Format Recognition and Analysis Program operating on a computer-controlled character recognition device. This study has resulted in general design techniques for Format Recognition and Analysis Programs applicable to any document which occurs with text and graphics intermixed. Two such programs have been completed, tested, and demonstrated for two technical journals, one Soviet and one U.S., and a third program has been outlined and partly written for another Soviet journal. It has been found that almost any journal can be programmed without serious difficulty, but new journals require substantially different programs. Author