Supervised and Unsupervised Speaker Adaptation in the NIST 2005 Speaker Recognition Evaluation
AIR FORCE RESEARCH LAB WRIGHT-PATTERSON AFB OH HUMAN EFFECTIVENESS DIRECTORATE
Pagination or Media Count:
Starting in 2004, the annual NIST Speaker Recognition Evaluation SRE has added an optional unsupervised speaker adaptation track where test files are processed sequentially and one may update the target model. In this paper, various model adaptation techniques are implemented using a supervised ideal adaptation scheme. Once the best performing model adaptation method is found, unsupervised adaptation experiments are run using a threshold to determine when to update the target model. Three NIST training conditions, l0sec4w, lconv4w, and 8conv4w, all with the lconv4w test condition are used for experiments with the NIST 2005 SRE. MinDCF values for the three training conditions are reduced from 0.0708 to 0.0277 for l0sec4w, from 0.0385 to 0.0199 for lconv4w, and from 0.0264 to 0.0176 for Sconv4w using the supervised adaptation compared to the baseline. For the unsupervised adaptation, minDCF values were reduced to 0.0590, 0.0302, and 0.0210 for the respective training conditions.
- Voice Communications