Nonlinear Auditory Modeling as a Basis for Speaker Recognition
MASSACHUSETTS INST OF TECH LEXINGTON LINCOLN LAB
Pagination or Media Count:
In this report, we develop a front-end nonlinear auditory model based on recent work of Dau, Puschel, and Kohlrausch DPK Dau, Puschel, and Kohlrausch, 1997. An important aspect of the model is the robust accentuation of temporal change in a signal at the cochlea level that forms the basis of a feature set for automatic speaker recognition. Preliminary speaker recognition experiments with the DPK front-end auditory model give performance close to that from the standard mel-cepstrum. Fusion of scores from the mel-cepstrum and the DPK front-end auditory model, however, is shown to give a useful performance gain relative to the standard mel-cepstrum alone. The dynamics provided by the nonlinear auditory model, therefore, appears to provide some orthogonality to that of the more static mel-cepstral representation. In addition, in this report, we provide initial development of new common modulation features based on modeling a more central region of auditory processing in the brains inferior colliculus than the low-level auditory front-end. These higher-level features rely on the DPK auditory model as a foundation for further analysis of low-level temporal trajectories. This new feature representation is an important research direction and provides additional feature orthogonality to front-end auditory processing, as exhibited in improved speaker recognition performance with fusion of scores from low-level and high-level feature sets.
- Voice Communications
- Anatomy and Physiology