Accession Number : ADA259076


Title :   Cepstral and Auditory Model Features for Speaker Recognition


Descriptive Note : Master's thesis


Corporate Author : AIR FORCE INST OF TECH WRIGHT-PATTERSON AFB OH SCHOOL OF ENGINEERING


Personal Author(s) : Colombi, John M


Full Text : https://apps.dtic.mil/dtic/tr/fulltext/u2/a259076.pdf


Report Date : 01 Dec 1992


Pagination or Media Count : 126


Abstract : The TIMIT and KING databases, as well as a ten day AFIT speaker corpus, are used to compare proven spectral processing techniques to an auditory neural representation for speaker identification. The feature sets compared were Linear Predictive Coding (LPC) cepstral coefficients and auditory nerve firing rates using the Payton model. This auditory model provides for the mechanisms found in the human middle and inner auditory periphery as well as neural transduction. Clustering algorithms were used to generate speaker specific codebooks - one statistically based and the other a neural approach. These algorithms are the Linde-Buzo-Gray (LBG) algorithm and a Kohonen self-organizing feature map (SOFM). The LBG algorithm consistently provided optimal codebook designs with corresponding better classification rates. The resulting Vector Quantized (VQ) distortion based classification indicates the auditory model provides slightly reduced recognition in clean studio quality recordings (LPC 100%, Payton 90%), yet achieves similar performance to the LPC cepstral representation in both degraded environments (both 95%) and in test data recorded over multiple sessions (both over 98%). A variety of normalization techniques, preprocessing procedures and classifier fusion methods were examined on this biologically motivated feature set. Speaker identification, Auditory models, Vector quantization, Neural networks, User verification.


Descriptors :   *NEURAL NETS , *SPEECH RECOGNITION , *DATA RATE , TEST AND EVALUATION , DATA BASES , HUMANS , IDENTIFICATION , QUALITY , PREPROCESSING , AUDITORY NERVE , NERVES , DISTORTION , QUANTIZATION , CLASSIFICATION , CODING , PROCESSING , MODELS , ALGORITHMS


Subject Categories : Anatomy and Physiology
      Electrical and Electronic Equipment
      Recording and Playback Devices
      Voice Communications


Distribution Statement : APPROVED FOR PUBLIC RELEASE