A Study in Speech Recognition Using a Kohonen Neural Network Dynamic Programming and Multi-Feature Fusion
Abstract:
The human perception system is multi-dimensional humans process more than just the sound of the word. Any speech recognition system that mimics human speech perception will need to be multi-dimensional. This methodology formed the basis for the design approach used in this research effort. Linear Predictive Coefficients LPC and formants were used as distinct and independent inputs into a recognition system consisting of a Kohonen neural network and a dynamic programming word classifier. A feature-fusion section and rule-based system were used to integrate the two input feature sets into one output result. This research effort involved extensive testing of the Kohonen network. Using a speech input signal, different Kohonen gain reduction methods, initial gain values, and conscience values were tested for various iteration times in an effort to quantify the response and capabilities of the Kohonen network. Three- dimensional Kohonen-Dynamic Programming surfaces were developed that graphically showed the effects of gain, conscience, and iteration time on the speech recognition response of a Kohonen neural network. A new standard iteration time called a multiple was used during training of the Kohonen networks. The results of the basic research on the Kohonen produced an optimized Kohonen configuration that was used in the multiple-feature recognition system. A 70-word vocabulary of F-16 cockpit commands were used to evaluate the new feature-fusion method. Theses