Segmentation and Labeling of Speech: A Comparative Performance Evaluation
CARNEGIE-MELLON UNIV PITTSBURGH PA DEPT OF COMPUTER SCIENCE
Pagination or Media Count:
This thesis studies speech recognition at the parametric level. It attempts to evaluate and understand the relative merits of a number of alternative design choices at that level. In particular, it involves an investigation of segmentation and labeling techniques, and the use of parametric representations for the acoustic signal. Every speech recognition system employs some parametric representation and some initial signal to symbol transformation. The author shows the performance currently available for these initial processes, and asserts that such performance is comparable to human performance. After presenting the relative merits of some typical parametric representations, we develop a methodology for such comparative evaluation. Simple, parameter- independent schemes for segmenting, labeling, and training are also developed. The role of pattern classification techniques is clarified, as it relates to the initial signal to symbol transformation. Four parametric representations were chosen for study a set of amplitudes and zero-crossing measurements from 5 octave filters a set of energy measurements from a 13 octave filter bank a smoothed, short-time spectrum computed from the LPC filter and the LPC coefficients themselves. Note that the first two involve the use of analog devices. Each method yields a set of measurements at uniform, short intervals--a pattern. Distance functions, chosen from pattern classification theory, are then applied to the parameter patterns as measures of acoustic similarity.
- Computer Programming and Software
- Voice Communications