Parallel pattern verifier with dynamic time warping

US 4,348,553 A
Filed: 07/02/1980
Issued: 09/07/1982
Est. Priority Date: 07/02/1980
Status: Expired due to Term

First Claim

Patent Images

1. A speech recognition system, comprising:

an elementary recognizer for classifying the elementary segments of an observed speech pattern as they are received, said elementary recognizer including correlation means for producing at an output node of said elementary recognizer a score of correlation of said elementary segments with stored spectral speech patterns; and

a plurality of local decision modules each connected to said output node for receiving said score of correlation;

said plurality of local decision modules being connected at node points in a network wherein different network paths through the nodes and their corresponding local decisions modules represent an accumulation of speech segments constituting different pronunciations of said speech pattern, the input of each said local decision module connected to said correlation means to receive the measures of correlation;

each local decision module specializing in a particular network node and including, means for determining the probability of how well the input segment of speech matches the particular sound segments associated with a given node, means for receiving from the other local decision modules the prior correlation scores of all preceding sound segments, means for selecting the locally optimum time warping of each segment of speech which are input from other local decision modules, and accumulator memory means for providing an accumulated correlation score for any one path in the network of local decision modules, said path representing an accumulation of segments or parts of a word or sound;

whereby the accumulated correlation score represents the most probable pronunciation of said speech pattern and the best recognition match derived from all the possible paths in the network of local decision modules.

View all claims

1 Assignment

Timeline View

Assignment View

0 Petitions

Accused Products

Abstract

A speech recognition system is disclosed which employs a network of elementary local decision modules for matching an observed time-varying speech pattern against all possible time warpings of the stored prototype patterns. For each elementary speech segment, an elementary recognizer provides a score indicating the degree of correlation of the input speech segment with stored spectral patterns. Each local decision module receives the results of the elementary recognizer and, at the same time, receives an input from selected ones of the other local decision modules. Each local decision module specializes in a particular node in the network wherein each node matches the probability of how well the input segment of speech matches the particular sound segments in the sounds of the words spoken. Each local decision module takes the prior decisions of all preceding sound segments which are input from the other local decision modules and makes a selection of the locally optimum time warping to be permitted. By this selection technique, each speech segment is stretched or compressed by an arbitrary, nonlinear function based on the control of the interconnections of the other local decision modules to a particular local decision module. Each local decision module includes an accumulator memory which stores the logarithmic probabilities of the current observation which is conditional upon the internal event specified by a word to be matched or identifier of the particular pattern that corresponds to the subject node for that particular pattern. For each observation, these probabilities are computed and loaded into the accumulator memory of all the modules and, the result of the locally optimum time warping representing the accumulated score or network path to a node for the word with the highest probability is chosen.

263 Citations

9 Claims

1. A speech recognition system, comprising:
- an elementary recognizer for classifying the elementary segments of an observed speech pattern as they are received, said elementary recognizer including correlation means for producing at an output node of said elementary recognizer a score of correlation of said elementary segments with stored spectral speech patterns; and
  
  a plurality of local decision modules each connected to said output node for receiving said score of correlation;
  
  said plurality of local decision modules being connected at node points in a network wherein different network paths through the nodes and their corresponding local decisions modules represent an accumulation of speech segments constituting different pronunciations of said speech pattern, the input of each said local decision module connected to said correlation means to receive the measures of correlation;
  
  each local decision module specializing in a particular network node and including, means for determining the probability of how well the input segment of speech matches the particular sound segments associated with a given node, means for receiving from the other local decision modules the prior correlation scores of all preceding sound segments, means for selecting the locally optimum time warping of each segment of speech which are input from other local decision modules, and accumulator memory means for providing an accumulated correlation score for any one path in the network of local decision modules, said path representing an accumulation of segments or parts of a word or sound;
  
  whereby the accumulated correlation score represents the most probable pronunciation of said speech pattern and the best recognition match derived from all the possible paths in the network of local decision modules.
- View Dependent Claims (2, 3, 4, 5, 6, 7, 8, 9)
- - 2. A system as recited in claim 1, wherein said means for selecting the locally optimum time warping includes a transition likelihood memory which provides the logarithmic probabilities of the current observation for the particular speech segment prototype at a given node.
  - 3. A system as recited in claim 1, further comprising a word prototype controller at each local decision module for providing, to said accumulator memory means, prototype speech information which is specialized for its respective module.
  - 4. A system as recited in claim 3, further comprising, in each local decision module, a partial results memory connected to the output of said accumulator memory means for receiving, for each observation of a speech segment, both the result of the base input-transition probabilities from other modules and the current local observation from said accumulator memory means, said partial results memory providing its accumulated results to the other local decision modules.
  - 5. A system as recited in claim 1 wherein each local decision module is arranged in a network of nodes which is a representation of the possible ways to pronounce a given word, and further comprising timing means at each local decision module for time displacing each node by the number of arcs or patterns traveled in said network.
  - 6. A system as recited in claim 1, wherein each local decision module includes a means for calculating the highest input transition probabilities from each of the local decision modules, and means for providing said highest input transition probabilities to the other local decision modules so that the subject module is ready to process the next prototype pattern.
  - 7. A system as recited in claim 1, wherein said accumulator memory means in each local decision module stores said logarithmic probabilities of the current observation in accordance with the following fundamental equation for the dynamic programming solution to the maximum log probability with nonlinear time warping of a hidden Markov process;
    - ##EQU6## wherein γ
      
      (t,j) represents the log probability for the best partial path which winds up at state j in the word prototype at time t thereby presenting the best path to arrive at a given node including all observations;
      
      t=1, 2, 3, . . . , T with all the local decision modules computing with the same value of t during a given cycle in the computation;
      
      γ
      
      (j,t) represents the present accumulated score up to and including the present score along the best possible path in the network;
      
      the symbol b[j,p(t)] represents the correlation score for a single elementary pattern segment at a particular time where an input segment (s,t) has a stored pattern (p,i,j);
      
      the term a(i,j) is the probability of going from position i to position j in the prototype for a single position step in the observed pattern;
      
      whereby the different values of i correspond to different positions within the prototype of a given word, and selection of a different value of i from time (t-1) to be connected to state j at time t represents the selection of a locally optimum dynamic time warping.
  - 8. A system as recited in claim 7, wherein said selector means includes multiplexor means for receiving and multiplexing the dynamic time warping outputs from each of the other local decision modules (γ
    - (t-1,i));
      
      a transition likelihood memory which provides the log of transition probability (a(i,j)) of going from position i to position j in the prototype for a single position step in the observed pattern;
      
      an accumulator memory which receives the outputs from said transition likelihood memory and said multiplexor and provides the sum of said outputs;
      
      and comparator means connected to the output of said accumulator memory for comparing said output with other outputs for different values of i representing different positions of the prototype of a given word thereby selecting a locally optimum dynamic time warping Max γ
      
      (t-1,i)+a(i,j).
  - 9. A system as recited in claim 8, wherein an accumulator of local comparison adds the locally optimum time warping ##EQU7## provided by said selector means to a correlation score b(j,P(t)) provided by a word prototype controller.

Specification

Resources

Litigation Campaign Assessment

Current Assignee
International Business Machines Corporation
Original Assignee
International Business Machines Corporation
Inventors
Baker, James K., Baker, Janet M.
Primary Examiner(s)
Nusbaum, Mark E.
Assistant Examiner(s)
Kemeny, E. S.

Application Number

US06/165,466
Time in Patent Office

797 Days
Field of Search

179/1 SB, 179/1 SD, 179/1 SC, 340/146.3 SY, 340/146.3 ED, 340/146.3 WD, 340/146.3 AQ, 364/728
US Class Current

704/241
CPC Class Codes

G06F 18/295   Markov models or related mo...

G06V 10/754   involving a deformation of ...

G10L 15/14   using statistical models, e...

Parallel pattern verifier with dynamic time warping

First Claim

1 Assignment

0 Petitions

Accused Products

Abstract

263 Citations

9 Claims

Specification

Solutions

Use Cases

Quick Links

Parallel pattern verifier with dynamic time warping

First Claim

1 Assignment

Subscription Required

Subscription Required

0 Petitions

Subscription Required

Accused Products

Subscription Required

Abstract

263 Citations

9 Claims

Specification

Subscription Required

Solutions

Use Cases

Quick Links