Evaluation of Speech Synthesis Systems using the Speech Reception Threshold Methodology
TNO HUMAN FACTORS SOESTERBERG (NETHERLANDS) THERMAL PHYSIOLOGY GROUP
Pagination or Media Count:
The intelligibility of speech sysnthesis systems that are available nowadays is usually high enough to enable comparisons between different synthesis systems based on the speech quality. However, in some situations, like a civil aircraft cockpit, the acoustic environment may be such that intelligibility is a discriminating factor between systems. In this paper we propose a methodology for comparing speech synthesis systems based on the Speech Reception Threshold SRT. With this method the signal-to-noise ratio is found at which 50 intelligibility of redundant sentences is reached. A system with a lower SRT value is said to be more robust against masking noise. We have compared 5 commercial speech synthesis systems 4 male voices, 5 female voices in an SRT experiment using a masking noise that was spectrally equivalent to cockpit noise. SRT values range from -4.1dB to 1.1dB. An ANOVA revealed that two of the nine systems had a significantly lower SRT than the rest. There was also an effect of the test subject, which is remarkable because the SRT has usually small variability over listeners.
- Anatomy and Physiology