Accession Number : ADA573719


Title :   Leveraging Multimodal Redundancy for Dynamic Learning, with SHACER - a Speech and HAndwriting reCognizER


Descriptive Note : Doctoral thesis


Corporate Author : OREGON HEALTH AND SCIENCE UNIV PORTLAND


Personal Author(s) : Kaiser, Edward C


Full Text : https://apps.dtic.mil/dtic/tr/fulltext/u2/a573719.pdf


Report Date : Apr 2007


Pagination or Media Count : 287


Abstract : New language constantly emerges from complex, collaborative human-human interactions like meetings such as when a presenter handwrites a new term on a whiteboard while saying it redundantly. Fixed vocabulary recognizers fail on such new terms, which often are critical to dialogue understanding. This dissertation presents SHACER, our Speech and HAndwriting reCoginzER (pronounced shaker). SHACER learns out-of-vocabulary terms dynamically by integrating information from instances of redundant handwriting and speaking. SHACER can automatically populate an MS Project TM Gantt Chart by observing a whiteboard scheduling meeting. To document the occurrence and importance of such multimodal redundancy, we examine (1) whiteboard presentations, (2) a spontaneous brainstorming meeting, and (3) informal annotation discussions about travel photographs. Averaged across these three contexts 96.5% of handwritten words were also spoken redundantly. We also find that redundantly presented terms are (a) highly topic specific and thus likely to be out-of-vocabulary, (b) more memorable, and (c) significantly better query terms for later search and retrieval. To combine information SHACER normalizes handwriting and speech recognizer out- puts by applying letter-to-sound and sound-to-letter transformations. SHACER then uses an articulatory-feature based distance metric to align handwriting to redundant speech. Phone sequence information from that aligned segment then constrains a second pass phone recognition over cached speech features. The resulting refined pronunciation serves as a measure against which the integration of all orthographic and pronunciation hypotheses is scored. High-scoring integrations are enrolled in the system's dictionaries and reinforcement tables. When a presenter subsequently says a newly enrolled term it is more easily recognized.


Descriptors :   *HANDWRITING , *SPEECH RECOGNITION , INFORMATION RETRIEVAL , INTERACTIONS , LANGUAGE , LEARNING , MULTIMODE , SPEECH , THESES , VOCABULARY , WORDS(LANGUAGE)


Subject Categories : Voice Communications


Distribution Statement : APPROVED FOR PUBLIC RELEASE