Recent Improvements in the CMU Spoken Language Understanding System
CARNEGIE-MELLON UNIV PITTSBURGH PA SCHOOL OF COMPUTER SCIENCE
Pagination or Media Count:
We have been developing a spoken language system to recognize and understand spontaneous speech. It is difficult for such systems to achieve good coverage of the lexicon and grammar that subjects might use because spontaneous speech often contains disfluencies and ungrammatical constructions. Our goal is to respond appropriately to input, even though coverage is not complete. The natural language component of our system is oriented toward the extraction of information relevant to a task, and seeks to directly optimize the correctness of the extracted information and therefore the system response. We use a flexible frame-based parser, which parses as much of the input as possible. This approach leads both to high accuracy and robustness. We have implemented a version of this system for the Air Travel Information Service AS task, which is being used by several ARPA-funded sites to develop and evaluate speech understanding systems. Users are asked to perform a task that requires getting information from an Air Travel database. In this paper, we describe recent improvements in our system resulting from our efforts to improve the coverage given a limited amount of training data. These improvements address a number of problems including generating an adequate lexicon and grammar for the recognizer, generating and generalizing an appropriate grammar for the parser, and dealing with ambiguous parses.