Machine Recognition vs Human Recognition of Voices
AIR FORCE RESEARCH LAB ROME NY
Pagination or Media Count:
While automated speaker recognition by machines can be quite good as demonstrated in NIST Speaker Recognition Evaluations, performance can still suffer when environmental conditions, emotions, or recording quality change. This research examines how robust humans are compared with machines at speaker recognition in changing environments. Several data conditions, including short sentences, frequency selective noise, and time-reversed speech were used to test the robustness of human listeners versus machine algorithms. Statistical significance tests were completed on the results and, for under conditions, human speaker recognition was more robust. The strength of the human listeners was especially evident for the challenging case of noise in the 2000-3000 Hz frequency range. Additional analysis was performed to identify factors that may impact a listeners ability to identify a persons identity. For example, the amount of voiced or unvoiced speech was examined to see if there was a correlation with how easily a speakers voice was recognized. Unfortunately, the amount of voiced or unvoiced speech did not correlate strongly with how easily a speakers voice was recognized. Other factors such as fundamental pitch, formant locations, pitch shimmer, pitch jitter, and other modulation measures also are being examined. The original goal of this effort was to discover which frequency bands are most important for the familiar speaker recognition task. This research was a cursory look at what frequency information is important for speaker identification. More listening experiments with better randomization of stimuli and phonetic consideration are required.
- Voice Communications