Accession Number : AD1031245


Title :   Performance Evaluation of Glottal Inverse Filtering Algorithms Using a Physiologically Based Articulatory Speech Synthesizer


Descriptive Note : Journal Article - Open Access


Corporate Author : MASSACHUSETTS INST OF TECH LEXINGTON LEXINGTON United States


Personal Author(s) : Quatieri,Thomas F ; Mehta,Daryush D ; Chien,Yu-Ren ; Guonason,Jon ; Zanartu,Matias


Full Text : https://apps.dtic.mil/dtic/tr/fulltext/u2/1031245.pdf


Report Date : 05 Jan 2017


Pagination or Media Count : 11


Abstract : Glottal inverse filtering aims to estimate the glottal airflow signal from a speech signal for applications such as speaker recognition and clinical assessment. Nonetheless, evaluation of inverse filtering performance has been challenging due to the practical difficulty in measuring the true glottal signals while speech signals are recorded. Apart from this, it is suspected that the performance of many methods degrade in conditions that are of great interest, such as breathy voice, high pitch, soft/loud voice, and running speech. This paper presents a comprehensive, objective, and comparative evaluation of state-of-the-art inverse filtering algorithms that takes advantage of speech and glottal signals generated by a physiologically relevant speech synthesizer. The synthesizer provides a realistic simulation of the voice production process, and thus an adequate test bed for revealing the temporal and spectral performance characteristics of each algorithm. Included in the synthetic data are continuous running speech utterances and sustained vowels, which are produced with multiple voice qualities (pressed, slightly pressed, modal,slightly breathy, and breathy) and subglottal pressure levels to simulate the natural variations in real speech. In evaluating the accuracy of a glottal flow estimate, multiple error measures are used, including an error in the estimated signal that measures overall waveform deviation, as well as an error in each of several clinically relevant features extracted from the glottal flow estimate. For two vowel-specific data subsets that were isolated for two open vowels and analyzed with three closed phase approaches, the resulting waveform errors had mean and standard deviation values below 20% and 10%, respectively, of the true glottal source amplitude. These approaches also showed remarkable stability across different voice qualities and subglottal pressure levels. Results of data subset analysis suggest that analysis of close rounded vowels


Descriptors :   waveforms , algorithms , filtration , filters , time domain , simulations , frequency response , frequency bands , discrete fourier transforms , Speech processing


Subject Categories : Statistics and Probability
      Linguistics


Distribution Statement : APPROVED FOR PUBLIC RELEASE