Speech Segregation based on Binary Classification
Technical Report,01 May 2012,30 Apr 2016
The Ohio State University Columbus United States
Pagination or Media Count:
This AFOSR project aimed to develop a classification-based approach to address the speech segregation challenge. The supervised approach is in sharp contrast to traditional speech segregation approaches. There are four major accomplishments made in this project. First, a supervised approach based on neural networks was developed to perform pitch tracking in very noisy conditions. Second, different training targets were examined for supervised speech segregation, leading to the adoption of the ideal ratio mask IRM. A subsequent listening evaluation shows increased intelligibility in noise for human listeners following IRM estimation. Third, an algorithm was proposed to recognize speakers in cochannel two-talker conditions. This algorithm uses deep neural networks for cochannel speaker identification, and achieves the state-of-the-art results in both anechoic and reverberant conditions. Fourth, a spectral mapping method was developed to address the issue of robustness to room reverberation. This supervised method learns a mapping from the magnitude spectrogram of reverberant speech to that of anechoic speech, as well as from the spectrogram of reverberant-noisy speech to that of anechoic-clean speech.