Open-Source Multi-Language Audio Database for Spoken Language Processing Applications
Final technical rept. Apr 2010-Apr 2012
STATE UNIV OF NEW YORK AT BINGHAMTON DEPT OF ELECTRICAL AND COMPUTER ENGINEERING
Pagination or Media Count:
This report gives a detailed summary of research work completed under Air Force Research Laboratory AFRL grant 53925, over the time period April 12, 2010 April 10, 2012. There are two main aspects of the work completed. First was the collection and annotation of a large open source data base of speech passages from web sites such as You Tube. 300 passages were collected in each of three languages English, Mandarin, and Russian. Approximately 30 hours of speech were collected for each language. Each passage has been carefully transcribed at the phrasal level by human listeners. Each passage was originally transcribed and then checked and the transcription edited as needed by at least two additional native language listeners. The English and Mandarin were then forced aligned and labeled at the phonetic level using a combination of manual and automatic methods. The Russian passages have not yet been marked at the phonetic level. Another phase of the work was to explore several algorithmic methods for improving automatic speech recognition ASR for this intelligible but challenging data base. Note that the body of the report has four main sections plus appendices which introduce, describe, and summarize a portion of the work.
- Voice Communications