Determination of phone weights for markov models in a speech recognition system

US 4,741,036 A
Filed: 01/31/1985
Issued: 04/26/1988
Est. Priority Date: 01/31/1985
Status: Expired due to Term

First Claim

Patent Images

1. In a speech recognition system having(a) a speech processor which converts input word utterances into coded label strings, and(b) a stored vocabulary comprising for each word a model comprising(i) a plurality of phones representation, and(ii) statistical data including label probabilities,wherein the probabilities that any label string represents the phones of a given word is indicated by corresponding probability vectors, and in which the label string of each word utterance to be recognized is matched in a Viterbi alignment procedure against word models in the vocabulary, whereby the word having the highest probability for the respective label string is selected as output word,a speech recognition method for improving the capability of discriminating between similar utterances corresponding to different words, the method comprising the steps of:

(a) identifying for each label string of a plurality of utterances, in a fast match procedure, a subset of coarsely matching candidate words and indicating which of these represented the correct word and which not,(b) generating for each word an inverted list of label strings for which it was selected in the fast match procedure, and indicating whether the selection was correct or not,(c) generating for each word, using the label strings identified in the inverted fast match output list and using the statistical data of the respective word model, a set of probability vectors in a Viterbi alignment procedure, each for one label string and carrying a designation whether the initial fast match selection was correct or wrong,(d) generating for each word, from the sets of probability vectors, in a linear discriminant analysis procedure, a weighting vector, and(e) weighting, during an actual speech recognition process, the probability vector elements by the associated weighting vector elements.

View all claims

1 Assignment

Timeline View

Assignment View

0 Petitions

Accused Products

Abstract

In a speech recognition system, discrimination between similar-sounding uttered words is improved by weighting the probability vector data stored for the Markov model representing the reference word sequence of phones. The weighting vector is derived for each reference word by comparing similar sounding utterances using Viterbi alignment and multivariate analysis which maximizes the differences between correct and incorrect recognition multivariate distributions.

Citations

17 Claims

1. In a speech recognition system having(a) a speech processor which converts input word utterances into coded label strings, and(b) a stored vocabulary comprising for each word a model comprising(i) a plurality of phones representation, and(ii) statistical data including label probabilities,wherein the probabilities that any label string represents the phones of a given word is indicated by corresponding probability vectors, and in which the label string of each word utterance to be recognized is matched in a Viterbi alignment procedure against word models in the vocabulary, whereby the word having the highest probability for the respective label string is selected as output word,a speech recognition method for improving the capability of discriminating between similar utterances corresponding to different words, the method comprising the steps of:
- (a) identifying for each label string of a plurality of utterances, in a fast match procedure, a subset of coarsely matching candidate words and indicating which of these represented the correct word and which not,(b) generating for each word an inverted list of label strings for which it was selected in the fast match procedure, and indicating whether the selection was correct or not,(c) generating for each word, using the label strings identified in the inverted fast match output list and using the statistical data of the respective word model, a set of probability vectors in a Viterbi alignment procedure, each for one label string and carrying a designation whether the initial fast match selection was correct or wrong,(d) generating for each word, from the sets of probability vectors, in a linear discriminant analysis procedure, a weighting vector, and(e) weighting, during an actual speech recognition process, the probability vector elements by the associated weighting vector elements.
- View Dependent Claims (2, 3)
- - 2. A speech recognition method in accordance with claim 1, characterized in that the linear discriminant analysis procedure of step (d) includes generating a threshold value in association with each weighting vector, and that in an additional step following step (e), the weighted probability vector elements are accumulated and compared against the threshold value for generating a selection criterion.
  - 3. A speech recognition method in accordance with claim 2, characterized in that in each matching procedure matching an utterance against a word model, a score value is obtained according to the equation:
    - ##EQU4## wherein w(0) is the threshold value for the respective word model, w(i) are the elements of the weighting vector, p(i) are the phone probabilities obtained in the matching procedure, and N is the number of phones of the respective word model.

4. A method for speech recognition using a stored vocabulary of statistical word models, each word model comprising a plurality of elements, in which method each utterance is converted to a coded form and each coded utterance is matched against at least a subset of the word models for generating one probability vector per each utterance and word model, and in which method a final output word is selected for each utterance on the basis of the respective probability vectors,characterized by the following additional steps for improving the discrimination capability of the recognition method:
- in a preliminary training operation;
  
  (a) producing at least one coded utterance for each word, and matching each such coded utterance against all word models to obtain a subset of candidate words for the respective utterance,(b) performing a fine match for each utterance against each candidate word model obtained in step (a), andgenerating a probability vector for the candidate word with respect to the utterance,(c) generating for each probability vector an indication whether the utterance was actually representing the word or not,(d) collecting for each word model all generated probability vectors and correctness indications,(e) generating, in a discriminant analysis procedure involving all generated probability vectors and correctness indications, a weighting vector for each word model, and(f) storing said weighting vectors for all word models;
  
  in each actual speech recognition process;
  
  using said weighting vector of each word model to modify each probability vector that was generated by matching a coded utterance against the respective word model.
- View Dependent Claims (5, 6)
- - 5. A speech recognition method in accordance with claim 4, characterized in that the number of elements in each generated probability vector and in each weighting vector is equal to the number of elements of the respective word model.
  - 6. A speech recognition method in accordance with claim 5, characterized in that in the discriminant analysis procedure of step (e), also a weighting threshold value is generated for each word model, and that in each actual speech recognition process, a decision value is generated according to the equation:
    - ##EQU5## wherein w(0) is the weighting threshold value for the respective word model, w(i) are the elements of the weighting vector, p(i) are the elements of the probability vector, and N is the number of elements in the word model.

7. A method for improving the recognition capability of a speech recognition system in which utterances in coded form are matched against stored word models to generate probability values for selecting the word having the highest probability score as output for each utterance,characterized in that in a preliminary training operation, a coarse selection of similar candidate words is made for training utterances, that each training utterance is aligned to the respective candidate word models to generate probability values, and that the generated probability values for all pairs of a candidate word and an associated training utterance, plus an indication for each pair whether it is a correct or incorrect combination, are analyzed in a processing step to generate a set of weighting coefficients for each word model.
- View Dependent Claims (8, 9)
- - 8. A method in accordance with claim 7, characterized in that in the analyzing processing step, a linear discriminant analysis is performed.
  - 9. A method in accordance with claim 7, for a speech recognition system wherein each word model represents a plurality of phones and each alignment of an utterance to a word model produces a probability value for each phone,characterized in that the number of weighting coefficients generated for each word model is equal to the respective number of phones, and that in an actual speech recognition process, each phone probability value is weighted by the associated weighting coefficient of the respective word model.

10. A method of improving discrimination between similar words in a vocabulary in a speech recognition system, the method comprising the steps of:
- collecting all words which, when uttered, have been characterized as candidates for a given word;
  
  indicating when an uttered word is correctly or wrongly identified as to the given word;
  
  forming for each uttered word a respective probability vector, each component of each respective probability vector being associated with a corresponding phone in the given word; and
  
  determining for each component a measure indicative of the variation between(a) each component of the probability vector for a correctly recognized uttered word, and(b) the corresponding component of the probability vectors of the wrongly recognized uttered word.
- View Dependent Claims (11, 12, 13, 14, 15, 16, 17)
- - 11. A method according to claim 10 comprising the further step of:
    - assigning a weight to each phone in the given word based on the measure of variation in the probability vector component corresponding thereto.
  - 12. A method according to claim 11 wherein the forming step includes the step of:
    - performing a Viterbi alignment of each uttered word against a Markov model of the given word.
  - 13. A method according to claim 12 wherein the measure determining step includes:
    - forming a first multivariate distribution for the probability vectors for correctly recognized uttered words;
      
      forming a second multivariate distribution for the probability vectors of wrongly recognized uttered words; and
      
      determining the linear function which maximizes the differences between the first and second multivariate distributions.
  - 14. A method according to claim 13 wherein each word in the vocabulary is processed as the given word, each word in the vocabulary thereby being assigned a weight for each phone thereof.
  - 15. A method according to claim 14 comprising the further step of:
    - applying to each phone along a path in a Viterbi decoder the weight assigned thereto; and
      
      performing word recognition in the Viterbi decoder which has the phones along the path thereof weighted.
  - 16. A method according to claim 15 wherein the collecting step includes the step of:
    - performing an approximate acoustic match to provide a list of candidate words which have relatively high probability of corresponding to a particular spoken word input.
  - 17. A method according to claim 10 wherein the collecting step includes the step of:
    - performing an approximate acoustic match to provide a list of candidate words which have relatively high probability of corresponding to a particular spoken word input.

Specification

Resources

Litigation Campaign Assessment

Current Assignee
International Business Machines Corporation
Original Assignee
International Business Machines Corporation
Inventors
Bahl, Lalit R., Mercer, Robert L., DeSouza, Peter V.
Primary Examiner(s)
Kemeny, Emanuel S.

Application Number

US06/696,976
Time in Patent Office

1,181 Days
Field of Search

381/41-47, 364/513.5
US Class Current

704/256
CPC Class Codes

G10L 15/144 Training of HMMs

G10L 2015/025 Phonemes, fenemes or fenone...

Determination of phone weights for markov models in a speech recognition system

First Claim

1 Assignment

0 Petitions

Accused Products

Abstract

Citations

17 Claims

Specification

Solutions

Use Cases

Quick Links

Determination of phone weights for markov models in a speech recognition system

First Claim

1 Assignment

Subscription Required

Subscription Required

0 Petitions

Subscription Required

Accused Products

Subscription Required

Abstract

Citations

17 Claims

Specification

Subscription Required

Solutions

Use Cases

Quick Links