Accession Number : ADA256820


Title :   Word and Subword Modelling in a Segment-Based HMM Word Spotter Using a Data Analytic Approach


Descriptive Note : Doctoral thesis


Corporate Author : MASSACHUSETTS INST OF TECH CAMBRIDGE LAB FOR COMPUTER SCIENCE


Personal Author(s) : Marcus, Jeffrey N


Full Text : https://apps.dtic.mil/dtic/tr/fulltext/u2/a256820.pdf


Report Date : Sep 1992


Pagination or Media Count : 323


Abstract : In this work we focus on methods for representing acoustic-phonetic knowledge in a speech recognizer and for analyzing the system's behavior in detail. The testbed for developing these methods is a segment-based hidden Markov model (HMM) recognizer. In this system, measurements are made on variable-duration segments. Ideally, each segment is associated with a single phonetic unit, which we refer to as a phone. The scheme has several potential advantages over the typical HMM recognizer, which is based on fixed-duration frames. They include a greater ability to model statistical dependence among spectral measurements, a more convenient framework for representing acoustic- phonetic knowledge, and a potential reduction in computation since the mean segment rate in our implementation is 1/5 of a typical frame rate. The HMM framework is used to model the segmenter's deviations from the ideal behavior of one segment per phone. We employ an HMM topology that allows a phone to be associated with more than one segment. Biphone HMM's model instances in which a segment is associated with more than one phone. We compared the effectiveness of various segment measurement sets on a phonetic recognition task. The measurements consisted of short-time spectral representations measured at particular positions relative to segment boundaries. The key result was that the addition of spectra measured outside the segment to those measured inside led to a significant improvement in performance. For the task of recognizing 39 phone labels, the best system attained a phonetic accuracy (% correct - % insertions) of 59% (95% confidence interval of 53-65%) on a set of nine male speakers from the VOYAGER corpus, result in the range of those previously reported for recognizers of comparable complexity.


Descriptors :   *SPEECH RECOGNITION , *SYSTEMS ANALYSIS , *ACOUSTIC DATA , *PHONETICS , MEASUREMENT , COMPUTATIONS , MODELS , EDGES , RATES , ACCURACY , REDUCTION , ADDITION , LABELS , MEAN , SPEECH , FRAMES , BEHAVIOR , WORK , MALES , ACOUSTICS , TIME , VARIABLES , BOUNDARIES , SPECTRA , TOPOLOGY , RECOGNITION


Subject Categories : Linguistics
      Voice Communications


Distribution Statement : APPROVED FOR PUBLIC RELEASE