Accession Number:

ADA289344

Title:

Isolated Digit Recognition Without Time Alignment.

Descriptive Note:

Master's thesis,

Corporate Author:

AIR FORCE INST OF TECH WRIGHT-PATTERSON AFB OH

Personal Author(s):

Report Date:

1994-12-01

Pagination or Media Count:

148.0

Abstract:

This thesis examines methods for isolated digit recognition without using time alignment. Resource requirements for isolated word recognizers that use time alignment can become prohibitively large as the vocabulary to be classified grows. Thus, methods capable of achieving recognition rates comparable to those obtained with current methods using these techniques are needed. The goals of this research are to find feature sets for speech recognition that perform well without using time alignment, and to identify classifiers that provide good performance with these features. Using the digits from the TI46 database, baseline speaker-independent recognition rates of 95.2 for the complete speaker set and 98.1 for the male speaker set are established using dynamic time warping DTW. This work begins with features derived from spectrograms of each digit. Based on a critical band frequency scale covering the telephone bandwidth 300-3000 Hz, these critical band energy features are classified alone and in combination with several other feature sets, with several different classifiers. With this method, there is one short feature vector per word. For speaker-independent recognition using the complete speaker set and a multi-layer perceptron MLP classifier, a recognition rate of 92.4 is achieved. For the same classifier with the male speaker set, a recognition rate of 97.1 is achieved. For the male speaker set, there is no statistical difference between results using DTW, and those using the MLP and no time alignment. This shows that there are feature sets that may provide high recognition rates for isolated word recognition without the need for time alignment.

Subject Categories:

  • Cybernetics
  • Voice Communications

Distribution Statement:

APPROVED FOR PUBLIC RELEASE