Training of markov models used in a speech recognition system

US 4,827,521 A
Filed: 03/27/1986
Issued: 05/02/1989
Est. Priority Date: 03/27/1986
Status: Expired due to Fees

First Claim

Patent Images

1. In a system for decoding a vocabulary word from outputs selected from an alphabet of outputs in response to a communicated word input wherein each word in the vocabulary is represented by a baseform of at least one probabilistic finite state model and wherein each probabilistic model has transition probability items and output probability items and wherein a probability value is stored for each of at least some probability items, a method of determining probability values comprising the step of:

biassing at least some of the stored probability values to enhance the likelihood that outputs generated in response to communication of a known word input are produced by the baseform for the known word relative to the respective likelihood of the generated outputs being produced by the baseform for at least one other word.

View all claims

1 Assignment

Timeline View

Assignment View

0 Petitions

Accused Products

Abstract

In a word, or speech, recognition system for decoding a vocabulary word from outputs selected from an alphabet of outputs in response to a communicated word input wherein each word in the vocabulary is represented by a baseform of at least one probabilistic finite state model and wherein each probabilistic model has transition probability items and output probability items and wherein a value is stored for each of at least some probability items, the present invention relates to apparatus and method for determining probability values for probability items by biassing at least some of the stored values to enhance the likelihood that outputs generated in response to communication of a known word input are produced by the baseform for the known word relative to the respective likelihood of the generated outputs being produced by the baseform for at least one other word. Specifically, the current values of counts --from which probability items are derived--are adjusted by uttering a known word and determining how often probability events occur relative to (a) the model corresponding to the known uttered "correct" word and (b) the model of at least one other "incorrect" word. The current count values are increased based on the event occurrences relating to the correct word and are reduced based on the event occurrences relating to the incorrect word or words.

Citations

21 Claims

1. In a system for decoding a vocabulary word from outputs selected from an alphabet of outputs in response to a communicated word input wherein each word in the vocabulary is represented by a baseform of at least one probabilistic finite state model and wherein each probabilistic model has transition probability items and output probability items and wherein a probability value is stored for each of at least some probability items, a method of determining probability values comprising the step of:
- biassing at least some of the stored probability values to enhance the likelihood that outputs generated in response to communication of a known word input are produced by the baseform for the known word relative to the respective likelihood of the generated outputs being produced by the baseform for at least one other word.

2. A method of decoding a vocabulary word from outputs selected from an alphabet of outputs in response to a communicated word input, wherein each word in the vocabulary is represented by at least one probabilistic model, each probabilistic model having (i) stored transition probability values each representing the probability of a corresponding transition in a model being taken and (ii) stored output probability values each representing the probability of a corresponding output probability being produced at a given transition or transitions in a model, the method comprising the steps of:
- (a) generating outputs in response to the communication of a known word input; and
  
  (b) biassing at least some of the stored values to enhance the likelihood that the generated outputs are produced by the baseform for the known word relative to the respective likelihood of the generated outputs being produced by the baseform for at least one other word.
- View Dependent Claims (3)
- - 3. The method of claim 2 comprising the further step of:
    - (c) in response to the utterance of an unknown word and the generating of label outputs in response thereto, determining the likelihood of words in the vocabulary based on the values of the probability items after biassing; and
      
      wherein each output corresponds to a label generated by an acoustic processor, said output generating step including the step of;
      
      (d) generating a string of labels in response to a speech input, each label corresponding to a distinct sound type and being selected from an alphabet of labels.

4. In a speech recognition system in which labels, from an alphabet of labels, are generated by an acoustic processor at successive label times in response to a speech input and in which words or portions thereof are represented probabilistically by Markov models, wherein each Markov model is characterized by (i) states, (ii) transitions between states, and (iii) probabiltity items wherein some probability items have previously defined probability values θ
- '"'"' which correspond to the likelihood of a transition in a given model being taken and wherein other probability items have previously defined probability values θ
  
  '"'"' which correspond to the likelihood of a specific label being produced at a transition of one or more predefined transitions in a given model, a method of evaluating counts from which enhanced probability values are derived comprising the steps of;
  
  (a) storing for each probability item a preliminary value θ
  
  '"'"';
  
  (b) defining and storing a set of counts wherein each probability item is determined from the value of at least one count associated therewith in storage, each count in the set having a value corresponding to the probability of a specific transition τ
  
  i being taken from a specific state Sj given (i) a specific label interval time t, (ii) a specific string of generated labels, and (iii) the stored θ
  
  '"'"' values;
  
  (c) uttering a known subject word and generating outputs in response thereto;
  
  (d) selecting an incorrect word other than the known word and, for each count used in deriving the value of a probability item in said incorrect word model, determining a minus count value from the generated outputs of the uttered known word; and
  
  (e) defining an adjusted count value wherein the stored value of each count serves as an addend and the minus value of each count serves as a subtrahend thereof.
- View Dependent Claims (5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20)
- - 5. The method of claim 4 comprising the further steps of:
    - (f) for each count used in deriving a probability item in the known word model, determining a plus count value from the generated outputs of the uttered known word;
      
      (g) the plus count value of a subject count serving as an addend in defining the adjusted count value for the subject count;
      
      the adjusted value of a subject count being determined by adding the stored value and plus count value and subtracting the minus count value.
  - 6. The method of claim 5 comprising the further steps of:
    - (h) repeating steps (c) through (g) for each word in a predefined script; and
      
      (j) recomputing the values of the probability items based on the most recent adjusted values of the counts after step (h).
  - 7. The method of claim 6 wherein each transition probability item Pr(τ
    - _i |S_j) is defined as;
      
      ##EQU3## divided by ##EQU4## where Y is a string of labels; and
      
      wherein each label output probability item Pr(f_h |τ
      
      _i,S_j) is defined as;
      space="preserve" listing-type="equation">Σ
      
      .sub.t;
      
      y.sbsb.t.sub.=f.sbsb.h Pr'"'"'(S.sub.j,τ
      
      .sub.i |Y,t)
      divided by ##EQU5## where f_h corresponds to a specific label selected from the alphabet of labels and y_t corresponds to the label generated at a time interval t.
  - 8. The method of claim 7 wherein the selecting of an incorrect word comprises the steps of:
    - (k) forming an ordered list of candidate words from the words in the vocabulary; and
      
      (l) choosing, as the selected word, the word having the highest likelihood of being wrongly selected as the uttered known subject word.
  - 9. The method of claim 8 wherein said list forming step includes the steps of:
    - characterizing each word as a sequence of phonetic elements, wherein each phonetic element has (i) a start-time distribution of probabilities q_n corresponding to respective successive start times t_n, (ii) a plurality of states between which transitions occur, (iii) a plurality of transition probabilities, each indicating the probability that a given transition in a given phonetic element occurs, (iv) a plurality of actual label probabilities, each actual output probability indicating the probability that a particular phonetic element generates a particular label at a particular transition in the particular phonetic element; and
      
      forming an approximate match for a subject word including the steps of;
      
      replacing all actual label probabilities associated with a given label generated by a given phonetic element at any transition therein with a corresponding specific replacement value;
      
      determining for one phonetic element after another in the subject word a probability Φ
      
      _n of a phonetic element ending at a respective one of a plurality of successive end times t_n as a function of start-time distribution, the probability of the phonetic element generating a label string of each of various lengths, and the replacement value p'"'"'(y_k) for each respective label y_k that is to be generated by the phonetic element to produce the incoming string of labels;
      
      characterizing the label length distribution as uniform between a minimum length and a maximum length with the probability elsewhere being set as zero;
      
      each Φ
      
      _n thereby being a function of start-time distribution, the uniform probability for each length between the minimum length and the maximum length, and the replacement value p'"'"'(y_k) for each respective label y_k that is to be generated by the phonetic element to produce the incoming string of labels;
      
      combining the values for the successive Φ
      
      _n '"'"'s to derive a match value for the phonetic element corresponding thereto; and
      
      combining match values for successive phonetic elements in a subject word to provide a word match score; and
      
      forming a list of candidate words in order of word match scores, at least most of the words in the vocabulary being excluded from the formed list.
  - 10. The method of claim 8 comprising the further step of:
    - (m) determining the likelihood of the correct word model producing the generated outputs;
      
      (n) determining the likelihood of the selected incorrect word model producing the generated outputs;
      
      (p) comparing the likelihoods determined in steps (m) and (n); and
      
      (q) conditioning the defining of an adjusted count value on whether the correct word likelihood fails to exceed the incorrect word likelihood by a prescribed increment.
  - 11. The method of claim 10 wherein determining the minus count value for a subject count includes the step of:
    - (r) determining a minus cumulative count value for each probability item in the incorrect word baseform, the minus cumulative count value being based on the outputs generated in response to the utterance of the known subject word and corresponding to a specific transition τ
      
      _i being taken from a specific state S_j at all label interval times t in the word model of the selected incorrect word, where the probability items have previously defined values.
  - 12. The method of claim 11 wherein step (r) includes the step of:
    - (s) applying the forward-backward algorithm to the word model for the selected incorrect word based on the outputs generated in response to the utterance of the known subject word to determine minus cumulative count values.
  - 13. The method of claim 11 wherein determining the plus count value for a subject count includes the step of:
    - (t) determining a plus cumulative count value for each probability item in the correct word baseform, the plus cumulative count value being based on the outputs generated in response to the utterance of the known subject word and corresponding to a specific transition τ
      
      _i being taken from a specific state S_j at all label interval times t in the word model of the correct word, where the probability items have previously defined values.
  - 14. The method of claim 13 wherein step (t) includes the step of:
    - (u) applying the forward-backward algorithm to the word model for the selected incorrect word based on the outputs generated in response to the utterance of the known subject word to determine the plus cumulative count values.
  - 15. The method of claim 14 wherein steps (a) through (u) comprise a cycle that is successively repeated for a predefined number of iterations;
    - each cycle being performed with stored values updated in the most recent previous cycle.
  - 16. The method of claim 5 wherein the determining of the minus count values includes the steps of:
    - (v) for a subject count used in deriving a probability value for a probability item in the incorrect word, determining a first value indicating the expected occurrence of the event corresponding to the subject count in the model of the incorrect word based on the outputs generated in response to utterance of the known word;
      
      (w) scaling the first value by a predefined amount;
      
      (x) the scaled value representing the minus count value for the subject count; and
      
      (y) repeating steps (v) through (x), until each count used in deriving a probability value for a probability item in the incorrect word model has been the subject count in at least one repetition.
  - 17. The method of claim 16 wherein the determining of the plus count values includes the steps of:
    - (aa) for a subject count used in deriving a probability value for a probability item in the incorrect word, determining a first value of the event corresponding to the subject count in the model of the incorrect word based on the outputs generated in response to utterance of the known word;
      
      (bb) scaling the first value by a predefined amount;
      
      (cc) the scaled value representing the plus count value for the subject count; and
      
      p1 (dd) repeating steps (aa) through (cc), until each count used in deriving a probability value for a probability item in the incorrect word model has been the subject count in at least one repetition.
  - 18. The method of claim 4 comprising the further step of:
    - (ee) determining maximum likelihood training values for counts which maximize the expression Pr(Y|M) where Y is a string of labels generated during initial training and M is a defined Markov model that includes maximum likelihood values for the probability items thereof;
      
      (ff) the adjustment of count values starting with the maximum likelihood training values as the current values.
  - 19. The method of claim 4 comprising the further step of:
    - (gg) determining maximum likelihood training values for probability items which maximize the expression Pr(Y|M) where Y is a string of labels generated during initial trainingand M is a defined Markov model that includes maximum likelihood values for the probability items thereof;
      
      (hh) the adjustment of probability items starting with the maximum likelihood training values as the computed values of step (b).
  - 20. The method of claim 19 wherein said list forming step includes the steps of:
    - characterizing each word as a sequence of phonetic elements, wherein each phonetic element has (i) a start-time distribution of probabilities q_n corresponding to respective successive start times t_n, (ii) a plurality of states between which transitions occur, (iii) a plurality of transition probabilities, each indicating the probability that a given transition in a given phonetic element occurs, (iv) a plurality of actual label probabilities, each actual output probability indicating the probability that a particular phonetic element generates a particular label at a partiular transition in the particular phonetic element; and
      
      forming an approximate match for a subject word including the steps of;
      
      replacing all actual label probabilities associated with a given label generated by a given phonetic element at any transition therein with a corresponding specific replacement value;
      
      determining for one phonetic element after another in the subject word the probability Φ
      
      _n of a phonetic element ending at a respective one of a plurality of successive end times t_n as a function of start-time distribution, the probability of the phonetic element generating a label string of each of various lengths, and the replacement value p'"'"'(y_k) for each respective label y_k that is to be generated by the phonetic element to produce the incoming string of labels;
      
      characterizing the label length distribution as uniform between a minimum length and a maximum length with the probability elsewhere being set as zero;
      
      each Φ
      
      _n thereby being a function of start-time distribution, the uniform probability for each length between the minimum length and the maximum length, and the replacement value p'"'"'(y_k) for each respective label y_k that is to be generated by the phonetic element to produce the incoming string of labels;
      
      combining the values for the successive Φ
      
      _n '"'"'s to derive a match value for the phonetic element corresponding thereto; and
      
      combining match values for successive phonetic elements in a subject word to provide a word match score; and
      
      forming a list of candidate words in order of word match scores, at least most of the words in the vocabulary being excluded from the formed list.

21. In a speech recognition system which decodes a vocabulary word from a string of output labels, each output label being selected from an alphabet of output labels in response to an uttered word input wherein each word in the vocabulary is represented by a baseform of at least one probabilistic finite state machine and wherein each probabilistic machine has transition probability items and output probability items, apparatus for determining probability values for probability items comprising:
- means for storing a current probability value for each probability item; and
  
  means for biassing the stored current probability values to enhance the likelihood that outputs generated in response to the utterance of a known spoken word input are produced by the baseform for the known word relative to the respective likelihood of the generated outputs being produced by the baseform for at least one other word.

Specification

Resources

Litigation Campaign Assessment

Current Assignee
International Business Machines Corporation
Original Assignee
International Business Machines Corporation
Inventors
deSouza, Peter V., Bahl, Lalit R., Brown, Peter F., Mercer, Robert L.
Primary Examiner(s)
KEMENY, EMANUEL

Application Number

US06/845,201
Time in Patent Office

1,132 Days
Field of Search

381/43, 364/513.5
US Class Current

704/256.4
CPC Class Codes

G10L 15/14 using statistical models, e...

Training of markov models used in a speech recognition system

First Claim

1 Assignment

0 Petitions

Accused Products

Abstract

Citations

21 Claims

Specification

Solutions

Use Cases

Quick Links

Training of markov models used in a speech recognition system

First Claim

1 Assignment

Subscription Required

Subscription Required

0 Petitions

Subscription Required

Accused Products

Subscription Required

Abstract

Citations

21 Claims

Specification

Subscription Required

Solutions

Use Cases

Quick Links