How Well Can an Agent Understand Different Accents?

Tadimeti, Divya; Georgila, Kallirroi; Traum, David

How Well Can an Agent Understand Different Accents?

Active / Technical Report | Accesssion Number: AD1183484 |

Open PDF

Abstract:

We evaluate several state-of-the-art automatic speech recognition systems on dialogue agent-directed English speech from speakers with General American vs. non-American accents. Our results show that the performance of the speech recognizers for non-American accents is considerably worse than for General American accents, with approx. 20 percent higher word error rate on average (relative difference). This work indicates a need for more diligent collection of and training on non-native English speaker data in order to narrow this performance gap. There are performance differences across recognizers, and while the same general pattern holds, with more errors for non-American accents, there are some accents for which the best recognizer is different than in the overall case. We expect these results to be useful for dialogue system designers in developing more robust inclusive dialogue systems, and for speech recognition providers in taking into account performance requirements for different accents.

Author(s):

Tadimeti, Divya ; Georgila, Kallirroi ; Traum, David

Author Organization(s):

UNIVERSITY OF SOUTHERN CALIFORNIA LOS ANGELES

Funding Organization(s):

ARMY RESEARCH LAB ADELPHI MD, ADELPHI, MD

Document Type:

Technical Report/Research Paper

Publication Date:

2020 Jan 01

Pagination:

4

Security Markings

DOCUMENT & CONTEXTUAL SUMMARY

Distribution Code:

A - Approved For Public Release

Distribution Statement: Public Release.

Copyright: Not Copyrighted

RECORD

Collection: TRECMS

Identifying Numbers

Contract Number(s):

W911NF-14-D-0005

Subject Terms

Descriptor(s):

automated speech recognition, language, audio files, computer science, dialogue systems, test and evaluation, new york, recognition, united states, agreements, standards, word recognition, african americans, california, computer languages, computers

Keyword(s):

ASR (Automatic speech recognition), Accents, non-native English speakerS, WER (Word Error Rate)

Subject Categories:

Mathematical and Computer Sciences; Behavioral and Social Sciences

Modernization Areas:

AI and Machine Learning

Descriptor(s):

automated speech recognition, dialogue systems, language, audio files, computer science, test and evaluation, new york, recognition, united states, agreements, standards, word recognition, african americans

Creation Date:

2022 Oct 28

Update Date:

2023 Oct 18