Speech Recognition Using Neural Nets and Dynamic Time Warping
AIR FORCE INST OF TECH WRIGHT-PATTERSON AFB OH SCHOOL OF ENGINEERING
Pagination or Media Count:
The purpose of this study is to demonstrate the feasibility of using Kohonen neural nets in speech recognition. This is done by combining a first level Kohonen net with a work recognition algorithm which is either dynamic time warping DTW or a second Kohonen net. A digitized utterance is sliced and processed to obtain a sequence of 15 component vectors. Each component corresponds to the energy in a selected frequency range. An utterance of the digits zero through nine is used to train the first Kohonen net. After training, an utterance input to the net produces a trajectory through the net. Each point on the trajectory corresponds to a node and a particular sound. These trajectories are input to a work recognition algorithm. The first of these, DTW, compares unknown utterances to template utterances. It is a computationally intense, mathematical algorithm, and it was used primarily to test the preprocessing and neural net training procedures. The second algorithm is a second Kohonen neural net. Digits are assigned to each node so that when an unknown trajectory is input to the second net, the node that lights up identifies the utterance. Using DTW, 99 isolated and 93 connected speech recognition rates are achieved. With the second Kohonen net, isolated speech is recognized at up to 96, depending upon the net format. Recommendations for future effort include increasing the vocabulary, using multiple feature sets and nets to attempt speaker independent speech recognition, and substituting a backward propagation multi-layer perceptron net for word recognition. Theses.
- Computer Programming and Software
- Voice Communications