Towards Link Characterization from Content
RUTGERS - THE STATE UNIV PISCATAWAY NJ
Pagination or Media Count:
In processing large volumes of speech and language data, we are often interested in the distribution of languages, speakers, topics, etc. For large data sets, these distributions are typically estimated at a given point in time using pattern classification technology. Such estimates can be highly biased, especially for rare classes. While these biases have been addressed in some applications, they have thus far been ignored in the speech and language literature. This neglect causes significant error for low-frequency classes. Correcting this biased distribution involves exploiting uncertain knowledge of the classifier error patterns. The Metropolis-Hastings algorithm allows us to construct a Bayes estimator for the true class proportions. We experimentally evaluate this algorithm for a speaker recognition task.
- Voice Communications