Accession Number:

AD1007967

Title:

Morphology-Based Language Modeling for Arabic Speech Recognition

Descriptive Note:

Conference Paper

Corporate Author:

SRI International Menlo Park United States

Report Date:

2004-10-08

Pagination or Media Count:

4.0

Abstract:

Language modeling is a difficult problem for languages with rich morphology. In this paper we investigate the use of morphology-based language models at different stages in a speech recognition system for conversational Arabic. Class-based and single-stream factored language models using morphological word representations are applied within an N-best list rescoring framework. In addition, we explore the use of factored language models in first-pass recognition, which is facilitated by two novel procedures the data-driven optimization of a multi-stream language model structure, and the conversion of a factored language model to a standard word-based model. We evaluate these techniques on a large-vocabulary recognition task and demonstrate that they lead to perplexity and word error rate reductions.

Subject Categories:

Distribution Statement:

APPROVED FOR PUBLIC RELEASE