Monaural Speech Segregation by Integrating Primitive and Schema-Based Analysis
Final rept. Feb 2004-Dec 2007
OHIO STATE UNIV COLUMBUS DEPT OF COMPUTER SCIENCE AND ENGINEERING
Pagination or Media Count:
The natural auditory environment typically contains multiple simultaneous events. A remarkable feat of auditory perception is the ability to disentangle the acoustic mixture and group the components of the same event into a stream. This aspect of human audition is called auditory scene analysis ASA, which has a primitive bottom-up process and a schema-based top-down process. A major task of auditory scene analysis is monaural segregation of speech from interfering sounds. This project seeks to develop an auditory scene analysis approach to monaural speech segregation. Consistent with the stated objectives of the project, the project has made considerable progress along the following four directions. First, we have proposed a schema-based model for phonemic restoration, which refers to the perceptual synthesis of the phonemes that are masked by appropriate replacement sounds by utilizing lexical context. Second, we have developed an approach to address the problem of sequential organization, which is based on trained speaker models. Third, we have proposed an approach for segmentation of auditory scenes based on event detection, in an attempt to address the segregation of unvoiced speech. Fourth, we have developed a comprehensive system for segregating unvoiced speech, a long standing challenge in computational auditory scene analysis CASA. In addition, encouraging progress has been made on enhancing reverberant speech and modeling of multitalker speech perception. A provisional patent application entitled A method for accurate pitch estimation and voice separation, has been filed as a result of this AFOSR grant. Executive summaries of doctoral dissertations supported can be found at the back of the reports.