Effects of Speech Recognition Accuracy on the Performance of DARPA Communicator Spoken Dialogue Systems
NATIONAL INST OF STANDARDS AND TECHNOLOGY GAITHERSBURG MD
Pagination or Media Count:
The DARPA Communicator program explored ways to construct better spoken-dialogue systems, with which users interact via speech alone to perform relatively complex tasks such as travel planning. During 2000 and 2001 two large data sets were collected from sessions in which paid users did travel planning using the Communicator systems that had been built by eight research groups. The research groups improved their systems intensively during the ten months between the two data collections. In this paper, we analyze these data sets to estimate the effects of speech recognition accuracy, as measured by Word Error Rate WER, on other metrics. The effects that we found were linear. We found correlation between WER and Task Completion, and that correlation, unexpectedly, remained more or less linear even for high values of WER. The picture for User Satisfaction metrics is more complex we found little effect of WER on User Satisfaction for WER less than about 35 to 40 in the 2001 data. The size of the effect of WER on Task Completion was less in 2001 than in 2000, and we believe this difference is due to improved strategies for accomplishing tasks despite speech recognition errors, which is an important accomplishment of the research groups who built the Communicator implementations. We show that additional factors must account for much of the variability in task success, and we present multivariate linear regression models for task success on the 2001 data. We also discuss the apparent gaps in the coverage of our metrics for spoken dialogue systems.
- Statistics and Probability
- Voice Communications