Accession Number:

ADA463058

Title:

Comparing Evaluation Metrics for Sentence Boundary Detection

Descriptive Note:

Conference paper

Corporate Author:

TEXAS UNIV AT DALLAS RICHARDSON

Personal Author(s):

Report Date:

2007-01-01

Pagination or Media Count:

5.0

Abstract:

In recent NIST evaluations on sentence boundary detection, a single error metric was used to describe performance. Additional metrics, however, are available for such tasks, in which a word stream is partitioned into subunits. This paper compares alternative evaluation metrics including the NIST error rate, classification error rate per word boundary, precision and recall, ROC curves, DET curves, precision-recall curves, and area under the curves and discusses advantages and disadvantages of each. Unlike many studies in machine learning, we use real data for a real task. We find benefit from using curves in addition to a single metric. Furthermore, we find that data skew has an impact on metrics, and that differences among different system outputs are more visible in precision-recall curves. Results are expected to help us better understand evaluation metrics that should be generalizable to similar language processing tasks.

Subject Categories:

  • Linguistics
  • Test Facilities, Equipment and Methods
  • Voice Communications

Distribution Statement:

APPROVED FOR PUBLIC RELEASE