DID YOU KNOW? DTIC has over 3.5 million final reports on DoD funded research, development, test, and evaluation activities available to our registered users. Click HERE
to register or log in.
A Semi-automated Evaluation Metric for Dialogue Model Coherence
We propose a new metric, Voted Appropriateness, which can be used to automatically evaluate dialogue policy decisions, once some wizard data has been collected. We show that this metric outperforms a previously proposed metric Weak agreement. We also present a taxonomy for dialogue model evaluation schemas, and orient our new metric within this taxonomy.
Approved For Public Release