Subphonetic Acoustic Modeling for Speaker-Independent Continuous Speech Recognition
CARNEGIE-MELLON UNIV PITTSBURGH PA DEPT OF COMPUTER SCIENCE
Pagination or Media Count:
To model the acoustics of a large vocabulary well while staying within a reasonable memory capacity, most speech recognition systems use phonetic models to share parameters across different words in the vocabulary. This dissertation investigates the merits of modeling at the subphonetic level. We demonstrate that sharing parameters at the subphonetic level provides more accurate acoustic models than sharing at the phonetic level. The concept of subphonetic parameter sharing can be applied to any class of parametric models. Since the first-order hidden Markov model HMM has been overwhelmingly successful in speech recognition, this dissertation bases all its studies and experiments on HMMs. The subphonetic unit we investigate is the state of phonetic HMMs. We develop a system in which similar Markov states of phonetic models share the same Markov parameters. The shared parameter i.e., the output distribution associated with a cluster of similar states is called a senone because of its state dependency. The phonetic models that share senones are shared-distribution models or SDMs. Experiments show that SDMs offer more accurate acoustic models than the generalized-triphone model presented by Lee. Senones are next applied to offer accurate models for triphones not experienced in the system training data. In this dissertation, two approaches for modeling unseen triphones are studied - purely decision-tree based senones and a hybrid approach using the concept of Markov state quantization. Both approaches indeed offer a significant error reduction over the previously accepted approach of monophone model substitution. However, the purely decision-tree based senone approach is preferred for its simplicity.
- Voice Communications