Constructing Markov model word baseforms from multiple utterances by concatenating model sequences for word segments
First Claim
Patent Images
1. In a Markov model speech recognition system having an acoustic processor which generates a string of labels in response to an uttered input where each label is one of an alphabet of labels, a computerized method of constructing Markov model word baseforms comprising the steps of:
- (a) for each of a set of Markov models in which each Markov model corresponds to a respective label and in which each Markov model has (i) a plurality of states and (ii) a plurality of arcs wherein each arc extends from a state to a state, computing and storing in computer memory arc probabilities and label output probabilities wherein each label output probability represents the likelihood of a given label being produced at a given arc;
(b) generating, with the acoustic processor, n respective strings of labels in response to each of n utterances of a subject word selected from a vocabulary of words;
(c) selecting the string of labels having a length which is closest to the average length of all strings generated in step (b);
(d) concatenating in sequence the Markov models which correspond to the successive labels in the selected string and storing the concatenated sequence;
(e) for a string other than the selected string, aligning successive substrings of zero or more labels against successive Markov models in the concatenated sequence, based on the stored probabilities;
(f) repeating step (e) for each generated string of step (b) other than the selected string, each string generated in step (b) having a respective substring corresponding to each Markov model in the concatenated sequence of step (d);
(g) partitioning the generated strings of step (b) into successive common segments, the ith common segment of each string corresponding to the i th substring thereof; and
(h) constructing a sequence of one or more Markov models for each ith common segment based on the ith label of the prototype string and the ith substrings of the other strings.
1 Assignment
0 Petitions
Accused Products
Abstract
The present invention relates to apparatus and method for segmenting multiple utterances of a vocabulary word in a consistent and coherent manner and determining a Markov model sequence for each segment. A fenemic Markov model corresponds to each label.
280 Citations
18 Claims
-
1. In a Markov model speech recognition system having an acoustic processor which generates a string of labels in response to an uttered input where each label is one of an alphabet of labels, a computerized method of constructing Markov model word baseforms comprising the steps of:
-
(a) for each of a set of Markov models in which each Markov model corresponds to a respective label and in which each Markov model has (i) a plurality of states and (ii) a plurality of arcs wherein each arc extends from a state to a state, computing and storing in computer memory arc probabilities and label output probabilities wherein each label output probability represents the likelihood of a given label being produced at a given arc; (b) generating, with the acoustic processor, n respective strings of labels in response to each of n utterances of a subject word selected from a vocabulary of words; (c) selecting the string of labels having a length which is closest to the average length of all strings generated in step (b); (d) concatenating in sequence the Markov models which correspond to the successive labels in the selected string and storing the concatenated sequence; (e) for a string other than the selected string, aligning successive substrings of zero or more labels against successive Markov models in the concatenated sequence, based on the stored probabilities; (f) repeating step (e) for each generated string of step (b) other than the selected string, each string generated in step (b) having a respective substring corresponding to each Markov model in the concatenated sequence of step (d); (g) partitioning the generated strings of step (b) into successive common segments, the ith common segment of each string corresponding to the i th substring thereof; and (h) constructing a sequence of one or more Markov models for each ith common segment based on the ith label of the prototype string and the ith substrings of the other strings. - View Dependent Claims (2, 3, 4, 5, 6, 7, 8)
-
-
9. In a speech recognition system, a computerized method used in determining Markov model sequences for words in a vocabulary based on multiple utterances of each word, the method comprising the steps of:
-
(a) generating, from an acoustic processor which assigns one of an alphabet of speech-type labels to each successive interval of speech, a respective string of labels for each utterance of a subject word; (b) storing the respective strings in computer memory; and (c) partitioning the generated strings for each utterance of the subject word into successive word segments; wherein step (c) includes the steps of; (d) computing and storing arc probabilities and label output probabilities for each of a set of Markov models, wherein each Markov model in the set corresponds to a respective label; (e) retrieving from storage the generated string corresponding to a prototype utterance for a subject word; (f) selecting the one Markov model after another in sequence which corresponds to the respective one label after another generated by the acoustic processor for the prototype utterance; (g) aligning each Markov model for the prototype utterance against labels generated for another utterance of the subject word, wherein the successive Markov models for the prototype utterance are aligned against successive substrings for said other utterance based on the stored probabilities; and (h) repeating step (g) for each utterance other than the prototype utterance; the ith label of the prototype string and the ith substring of each other string representing the ith segment of each respective utterance. - View Dependent Claims (10, 11, 12, 13, 14, 15)
-
-
16. Apparatus for constructing a Markov model word baseform for a word in a vocabulary from multiple utterances thereof comprising:
-
acoustic processor means for generating a string of labels in response to an uttered speech input; means, coupled to receive label string outputs from the acoustic processor means, for storing labels for multiple strings of labels generated by the acoustic processor in response to multiple utterances of a subject word; means for retrieving a prototype string from among the stored strings for the subject word; means, coupled to receive as input a retrieved prototype string, for forming a singleton word baseform for the retrieved prototype string; means, coupled to retrieve label strings from the label string storing means and coupled to the singleton baseform forming means, for aligning the labels in strings other than the selected prototype string against the singleton baseform, each string being divided into successive substrings respectively aligned against successive fenemic Markov models in the singleton baseform; and correlator means, coupled to receive input alignment data from the aligning means, for grouping the ith substrings of the multiple strings; each group of ith substrings corresponding to a common word segment. - View Dependent Claims (17, 18)
-
Specification