Beyond Class A: A Proposal for Automatic Evaluation of Discourse
UNISYS DEFENSE SYSTEMS PAOLI PA
Pagination or Media Count:
The DARPA Spoken Language community has just completed the first trial evaluation of spontaneous queryresponse pairs in the Air Travel ATIS domain. Our goal has been to find a methodology for evaluating correct responses to user queries. To this end, we agreed, for the first trial evaluation, to constrain the problem in several ways Database Application Constrain the application to a database query application, to ease the burden of a constructing the back-end, and b determining correct responses Canonical Answer Constrain answer comparison to a minimal canonical answer that imposes the fewest constraints on the form of system response displayed to a user at each site Typed Input Constrain the evaluation to typed input only Class A Constrain the test set to single unambiguous intelligible utterances taken without context that have well-defined database answers class A sentences. These were reasonable constraints to impose on the first trial evaluation. However, it is clear that we need to loosen these constraints to obtain a more realistic evaluation of spoken language systems. The purpose of this paper is to suggest how we can move beyond evaluation of class A sentences to an evaluation of connected dialogue, including out-of-domain queries.
- Voice Communications