Parallel pattern verifier with dynamic time warping
First Claim
1. A speech recognition system, comprising:
- an elementary recognizer for classifying the elementary segments of an observed speech pattern as they are received, said elementary recognizer including correlation means for producing at an output node of said elementary recognizer a score of correlation of said elementary segments with stored spectral speech patterns; and
a plurality of local decision modules each connected to said output node for receiving said score of correlation;
said plurality of local decision modules being connected at node points in a network wherein different network paths through the nodes and their corresponding local decisions modules represent an accumulation of speech segments constituting different pronunciations of said speech pattern, the input of each said local decision module connected to said correlation means to receive the measures of correlation;
each local decision module specializing in a particular network node and including, means for determining the probability of how well the input segment of speech matches the particular sound segments associated with a given node, means for receiving from the other local decision modules the prior correlation scores of all preceding sound segments, means for selecting the locally optimum time warping of each segment of speech which are input from other local decision modules, and accumulator memory means for providing an accumulated correlation score for any one path in the network of local decision modules, said path representing an accumulation of segments or parts of a word or sound;
whereby the accumulated correlation score represents the most probable pronunciation of said speech pattern and the best recognition match derived from all the possible paths in the network of local decision modules.
1 Assignment
0 Petitions
Accused Products
Abstract
A speech recognition system is disclosed which employs a network of elementary local decision modules for matching an observed time-varying speech pattern against all possible time warpings of the stored prototype patterns. For each elementary speech segment, an elementary recognizer provides a score indicating the degree of correlation of the input speech segment with stored spectral patterns. Each local decision module receives the results of the elementary recognizer and, at the same time, receives an input from selected ones of the other local decision modules. Each local decision module specializes in a particular node in the network wherein each node matches the probability of how well the input segment of speech matches the particular sound segments in the sounds of the words spoken. Each local decision module takes the prior decisions of all preceding sound segments which are input from the other local decision modules and makes a selection of the locally optimum time warping to be permitted. By this selection technique, each speech segment is stretched or compressed by an arbitrary, nonlinear function based on the control of the interconnections of the other local decision modules to a particular local decision module. Each local decision module includes an accumulator memory which stores the logarithmic probabilities of the current observation which is conditional upon the internal event specified by a word to be matched or identifier of the particular pattern that corresponds to the subject node for that particular pattern. For each observation, these probabilities are computed and loaded into the accumulator memory of all the modules and, the result of the locally optimum time warping representing the accumulated score or network path to a node for the word with the highest probability is chosen.
263 Citations
9 Claims
-
1. A speech recognition system, comprising:
-
an elementary recognizer for classifying the elementary segments of an observed speech pattern as they are received, said elementary recognizer including correlation means for producing at an output node of said elementary recognizer a score of correlation of said elementary segments with stored spectral speech patterns; and a plurality of local decision modules each connected to said output node for receiving said score of correlation;
said plurality of local decision modules being connected at node points in a network wherein different network paths through the nodes and their corresponding local decisions modules represent an accumulation of speech segments constituting different pronunciations of said speech pattern, the input of each said local decision module connected to said correlation means to receive the measures of correlation;each local decision module specializing in a particular network node and including, means for determining the probability of how well the input segment of speech matches the particular sound segments associated with a given node, means for receiving from the other local decision modules the prior correlation scores of all preceding sound segments, means for selecting the locally optimum time warping of each segment of speech which are input from other local decision modules, and accumulator memory means for providing an accumulated correlation score for any one path in the network of local decision modules, said path representing an accumulation of segments or parts of a word or sound; whereby the accumulated correlation score represents the most probable pronunciation of said speech pattern and the best recognition match derived from all the possible paths in the network of local decision modules. - View Dependent Claims (2, 3, 4, 5, 6, 7, 8, 9)
-
Specification