Learning New Words from Spontaneous Speech: A Project Summary
CARNEGIE-MELLON UNIV PITTSBURGH PA DEPT OF COMPUTER SCIENCE
Pagination or Media Count:
This research develops methods that enable spoken language systems to detect and correct their own errors, automatically extending themselves to incorporate new words. The occurrence of unknown or out-of-vocabulary words is one of the major problems frustrating the use of automatic speech understanding systems in real world tasks. Novel words cause recognition errors and often result in recognition and understanding failures. Yet, they are common. Real system users speak in a spontaneous and relatively unconstrained fashion. They do not know what words the system can recognize and thereby are likely to exceed the systems coverage. Even if speakers constrained their speech, there would still be a need for self-extending systems as certain tasks inherently require dynamic vocabulary expansion e.g. new company names, new flight destinations, etc.. Further, it is costly and labor intensive to collect enough training data to develop a representative vocabulary lexicon and language model for a spoken interface application. Unlike transcription tasks where it is often possible to find large amounts of on-line data from which a lexicon and language model can be developed, for many tasks this is not feasible. Developers of applications and database interfaces will probably not have the resources to gather a large corpus of examples to train a system to their specific task. Yet, most current speech and language model research is oriented toward training from large corpora. This research enables systems to be developed from small amounts of data and then bootstrapped. A simple version of a system is.