Word and Subword Modelling in a Segment-Based HMM Word Spotter Using a Data Analytic Approach
MASSACHUSETTS INST OF TECH CAMBRIDGE LAB FOR COMPUTER SCIENCE
Pagination or Media Count:
In this work we focus on methods for representing acoustic-phonetic knowledge in a speech recognizer and for analyzing the systems behavior in detail. The testbed for developing these methods is a segment-based hidden Markov model HMM recognizer. In this system, measurements are made on variable-duration segments. Ideally, each segment is associated with a single phonetic unit, which we refer to as a phone. The scheme has several potential advantages over the typical HMM recognizer, which is based on fixed-duration frames. They include a greater ability to model statistical dependence among spectral measurements, a more convenient framework for representing acoustic- phonetic knowledge, and a potential reduction in computation since the mean segment rate in our implementation is 15 of a typical frame rate. The HMM framework is used to model the segmenters deviations from the ideal behavior of one segment per phone. We employ an HMM topology that allows a phone to be associated with more than one segment. Biphone HMMs model instances in which a segment is associated with more than one phone. We compared the effectiveness of various segment measurement sets on a phonetic recognition task. The measurements consisted of short-time spectral representations measured at particular positions relative to segment boundaries. The key result was that the addition of spectra measured outside the segment to those measured inside led to a significant improvement in performance. For the task of recognizing 39 phone labels, the best system attained a phonetic accuracy correct - insertions of 59 95 confidence interval of 53-65 on a set of nine male speakers from the VOYAGER corpus, result in the range of those previously reported for recognizers of comparable complexity.
- Voice Communications