Determination of phone weights for markov models in a speech recognition system
First Claim
1. In a speech recognition system having(a) a speech processor which converts input word utterances into coded label strings, and(b) a stored vocabulary comprising for each word a model comprising(i) a plurality of phones representation, and(ii) statistical data including label probabilities,wherein the probabilities that any label string represents the phones of a given word is indicated by corresponding probability vectors, and in which the label string of each word utterance to be recognized is matched in a Viterbi alignment procedure against word models in the vocabulary, whereby the word having the highest probability for the respective label string is selected as output word,a speech recognition method for improving the capability of discriminating between similar utterances corresponding to different words, the method comprising the steps of:
- (a) identifying for each label string of a plurality of utterances, in a fast match procedure, a subset of coarsely matching candidate words and indicating which of these represented the correct word and which not,(b) generating for each word an inverted list of label strings for which it was selected in the fast match procedure, and indicating whether the selection was correct or not,(c) generating for each word, using the label strings identified in the inverted fast match output list and using the statistical data of the respective word model, a set of probability vectors in a Viterbi alignment procedure, each for one label string and carrying a designation whether the initial fast match selection was correct or wrong,(d) generating for each word, from the sets of probability vectors, in a linear discriminant analysis procedure, a weighting vector, and(e) weighting, during an actual speech recognition process, the probability vector elements by the associated weighting vector elements.
1 Assignment
0 Petitions
Accused Products
Abstract
In a speech recognition system, discrimination between similar-sounding uttered words is improved by weighting the probability vector data stored for the Markov model representing the reference word sequence of phones. The weighting vector is derived for each reference word by comparing similar sounding utterances using Viterbi alignment and multivariate analysis which maximizes the differences between correct and incorrect recognition multivariate distributions.
-
Citations
17 Claims
-
1. In a speech recognition system having
(a) a speech processor which converts input word utterances into coded label strings, and (b) a stored vocabulary comprising for each word a model comprising (i) a plurality of phones representation, and (ii) statistical data including label probabilities, wherein the probabilities that any label string represents the phones of a given word is indicated by corresponding probability vectors, and in which the label string of each word utterance to be recognized is matched in a Viterbi alignment procedure against word models in the vocabulary, whereby the word having the highest probability for the respective label string is selected as output word, a speech recognition method for improving the capability of discriminating between similar utterances corresponding to different words, the method comprising the steps of: -
(a) identifying for each label string of a plurality of utterances, in a fast match procedure, a subset of coarsely matching candidate words and indicating which of these represented the correct word and which not, (b) generating for each word an inverted list of label strings for which it was selected in the fast match procedure, and indicating whether the selection was correct or not, (c) generating for each word, using the label strings identified in the inverted fast match output list and using the statistical data of the respective word model, a set of probability vectors in a Viterbi alignment procedure, each for one label string and carrying a designation whether the initial fast match selection was correct or wrong, (d) generating for each word, from the sets of probability vectors, in a linear discriminant analysis procedure, a weighting vector, and (e) weighting, during an actual speech recognition process, the probability vector elements by the associated weighting vector elements. - View Dependent Claims (2, 3)
-
-
4. A method for speech recognition using a stored vocabulary of statistical word models, each word model comprising a plurality of elements, in which method each utterance is converted to a coded form and each coded utterance is matched against at least a subset of the word models for generating one probability vector per each utterance and word model, and in which method a final output word is selected for each utterance on the basis of the respective probability vectors,
characterized by the following additional steps for improving the discrimination capability of the recognition method: in a preliminary training operation; (a) producing at least one coded utterance for each word, and matching each such coded utterance against all word models to obtain a subset of candidate words for the respective utterance, (b) performing a fine match for each utterance against each candidate word model obtained in step (a), and generating a probability vector for the candidate word with respect to the utterance, (c) generating for each probability vector an indication whether the utterance was actually representing the word or not, (d) collecting for each word model all generated probability vectors and correctness indications, (e) generating, in a discriminant analysis procedure involving all generated probability vectors and correctness indications, a weighting vector for each word model, and (f) storing said weighting vectors for all word models; in each actual speech recognition process;
using said weighting vector of each word model to modify each probability vector that was generated by matching a coded utterance against the respective word model.- View Dependent Claims (5, 6)
-
7. A method for improving the recognition capability of a speech recognition system in which utterances in coded form are matched against stored word models to generate probability values for selecting the word having the highest probability score as output for each utterance,
characterized in that in a preliminary training operation, a coarse selection of similar candidate words is made for training utterances, that each training utterance is aligned to the respective candidate word models to generate probability values, and that the generated probability values for all pairs of a candidate word and an associated training utterance, plus an indication for each pair whether it is a correct or incorrect combination, are analyzed in a processing step to generate a set of weighting coefficients for each word model.
-
10. A method of improving discrimination between similar words in a vocabulary in a speech recognition system, the method comprising the steps of:
-
collecting all words which, when uttered, have been characterized as candidates for a given word; indicating when an uttered word is correctly or wrongly identified as to the given word; forming for each uttered word a respective probability vector, each component of each respective probability vector being associated with a corresponding phone in the given word; and determining for each component a measure indicative of the variation between (a) each component of the probability vector for a correctly recognized uttered word, and (b) the corresponding component of the probability vectors of the wrongly recognized uttered word. - View Dependent Claims (11, 12, 13, 14, 15, 16, 17)
-
Specification