Speech recognition by concatenating fenonic allophone hidden Markov models in parallel among subwords
First Claim
1. A speech recognizer comprising:
- means for analyzing a word inputted as speech for its features and thus obtaining a label sequence or feature vector sequence corresponding to said word;
means for retaining hidden Markov models respectively for one or more allophones of subwords of each speech transformation candidate;
dictionary means for retaining a plurality of candidate words to be recognized;
means for composing a speech model by concatenating each hidden Markov model for allophones of each speech transformation candidate in parallel among subwords in correspondence to a candidate word;
means for determining a probability of a speech model composed with regard to each candidate word to output the label sequence or feature vector sequence of said word inputted as speech, and outputting the candidate word corresponding to a speech model of a highest probability as a result of recognition.
1 Assignment
0 Petitions
Accused Products
Abstract
Analysis of a word input from a speech input device 1 for its features is made by a feature extractor 4 to obtain a feature vector sequence corresponding to said word, or to obtain a label sequence by applying a further transformation in a labeler 8. Fenonic hidden Markov models for speech transformation candidates are combined with N-gram probabilities (where N is all integer greater than or equal to 2) to produce models of words. The recognizer determines the probability that the speech model composed for each candidate word would output the label sequence or feature vector sequence input as speech, and outputs the candidate word corresponding to the speech model having the highest probability to a display 19.
258 Citations
6 Claims
-
1. A speech recognizer comprising:
-
means for analyzing a word inputted as speech for its features and thus obtaining a label sequence or feature vector sequence corresponding to said word; means for retaining hidden Markov models respectively for one or more allophones of subwords of each speech transformation candidate; dictionary means for retaining a plurality of candidate words to be recognized; means for composing a speech model by concatenating each hidden Markov model for allophones of each speech transformation candidate in parallel among subwords in correspondence to a candidate word; means for determining a probability of a speech model composed with regard to each candidate word to output the label sequence or feature vector sequence of said word inputted as speech, and outputting the candidate word corresponding to a speech model of a highest probability as a result of recognition.
-
-
2. A speech recognizer comprising:
-
means for analyzing a word inputted as speech for its features and thus obtaining a label sequence or feature vector sequence corresponding to said word; means for retaining fenonic hidden Markov models; means for retaining the label sequence for each speech transformation candidate on which subwords of a word are transformed as speech; dictionary means for retaining a plurality of candidate words to be recognized; means for applying fenonic hidden Markov models of allophones of subwords of each speech transformation candidate in correspondence to said candidate words and concatenating the models for each speech transformation candidate in parallel among the subwords to compose a speech model; means for determining a probability of a speech model composed with regard to each candidate word so as to output the label sequence or feature vector sequence of said word inputted as speech, and outputting the candidate word corresponding to the speech model of a highest probability as a result of recognition.
-
-
3. A speech recognizer comprising:
-
means for analyzing a word inputted as speech for its features and thus obtaining a label sequence or feature vector sequence corresponding to said word; means for retaining hidden Markov models for each speech transformation candidate, the hidden Markov models comprising allophones of one or more subwords of the candidate, by assigning to each allophone an N-gram relation (N=an integer greater than or equal to
2) with the speech transformation candidates of other preceding subwords in the word;dictionary means for retaining a plurality of candidate words to be recognized; means for concatenating each hidden Markov model for each speech transformation candidate in parallel among the subwords in correspondence to said candidate words and on the basis of said N-gram relation to compose a speech model; means for determining the probability of a speech model composed with regard to each candidate word to output the label sequence or feature vector sequence of said word inputted as speech, and outputting a candidate word corresponding to the speech model of the highest probability as a result of recognition.
-
-
4. A speech recognizer comprising:
-
means for analyzing a word inputted as speech for its features and thus obtaining a label sequence or feature vector sequence corresponding to said word; means for retaining fenonic hidden Markov models for each speech transformation candidate, the hidden Markov models comprising allophones of one or more subwords of the candidate, by assigning to each allophone an N-gram relation (N=an integer greater than or equal to
2) with the speech transformation candidates of other preceding subwords in the word;dictionary means for retaining a plurality of candidate words to be recognized; means for applying hidden Markov models to each speech transformation candidate in correspondence to said candidate words and on the basis of said N-gram relation, and concatenating each hidden Markov model for each of these speech transformation candidates in parallel among the subwords to compose a speech model; means for determining the probability of a speech model composed with regard to each candidate word to output the label sequence or feature vector sequence of said word inputted as speech, and outputting a candidate word corresponding to the speech model of the highest probability as a result of recognition.
-
-
5. A method of speech recognition comprising the steps of:
-
retaining fenonic hidden Markov models for each speech transformation candidate, the hidden Markov models comprising allophones of one or more subwords of the candidate, by assigning to each allophone an N-gram relation (N=an integer greater than or equal to
2) with the speech transformation candidates of other preceding subwords in the word;retaining label sequences for each speech transformation candidate on which the subwords of a word are transformed as speech; retaining a plurality of candidate words to be recognized; analyzing a word inputted as speech for its features and obtaining a label sequence or feature vector sequence corresponding to the word concerned; applying fenonic hidden Markov models to each speech transformation candidate in correspondence to said candidate words and on the basis of said N-gram relation; concatenating each fenonic hidden Markov model for each of these speech transformation candidates in parallel among the subwords to compose a speech model; determining the probability of a speech model composed with regard to each candidate word to output the label sequence or feature vector sequence of said word inputted as speech, and outputting a candidate word corresponding to the speech model of the highest probability as a result of recognition. - View Dependent Claims (6)
-
Specification