Apparatus and method for speech recognition
First Claim
1. A speech recognition apparatus for applying speaker adaptation to acoustic models based on feature vectors of utterances, said apparatus comprising:
- speech recognition device for comparing said acoustic models against said feature vector of utterances, and outputting a recognition result specifying a sequence of acoustic models having maximum likelihood, a first score indicating a value of the maximum likelihood, and a second score indicating a value of second highest likelihood;
judging device for comparing an evaluation value based on said first score and said second score with a pre-set threshold value, and judging said recognition result as being true when said evaluation value is in a specific relation with respect to said threshold value; and
speaker adaptation processing device for applying speaker adaptation to said acoustic models once said judging device judges said recognition result as being true.
1 Assignment
0 Petitions
Accused Products
Abstract
A true/false judgment on a result of speech recognition is made with high accuracy using a less volume of processing. By comparing acoustic models HMMsb against the feature vector sequence V(n) of utterances, a recognition result RCG specifying the acoustic model HMMsb having the maximum likelihood, a first score FSCR indicting the value of the maximum likelihood, and a second score SSCR indicating the value of the second highest likelihood are found. Then, by comparing an evaluation value FSCR×(FSCR−SSCR) based on the first score FSCR and the second score SSCR with a pre-set threshold value THD, a true/false judgment on the recognition result RCG is made. When the recognition result RCG is judged as being true, speaker adaptation is applied to the acoustic models HMMsb, and when the recognition result RCG is judged as being false, speaker adaptation is not applied to the acoustic models HMMsb. It is thus possible to improve the accuracy of speaker adaptation.
Robust speaker adaptation which remains unsusceptible to influences of background noises is achieved. Initial acoustic models Mc are stored in advance in a speaker adapted model storing section, and a noise adapting section generates noise adapted models Mc′ by applying noise adaptation to the initial acoustic models Mc pre-stored in the speaker adapted model storing section. A speaker adaptation parameter calculating section generates speaker adaptation parameters P based on the noise adapted models Mc′ and a feature vector sequence V(n) of utterances from the speaker, and a acoustic model updating section generates speaker adapted models Mc″ by applying speaker adaptation processing to the initial acoustic models Mc using the speaker adaptation parameters P. The initial acoustic models Mc are replaced with the speaker adapted models Mc″ so as to be updated and newly stored in the speaker adapted model storing section. At the time of speech recognition, the noise adapting section generates speaker adapted models Mreg adapted to noises by applying noise adaptation to the updated and newly stored speaker adapted models Mc″ instead of the initial acoustic models Mc. Then, a speech recognition section performs speech recognition by comparing sequences formed by the speaker adapted models Mreg adapted to noises against the feature vector sequence V(n) of utterances to be recognized.
91 Citations
16 Claims
-
1. A speech recognition apparatus for applying speaker adaptation to acoustic models based on feature vectors of utterances, said apparatus comprising:
-
speech recognition device for comparing said acoustic models against said feature vector of utterances, and outputting a recognition result specifying a sequence of acoustic models having maximum likelihood, a first score indicating a value of the maximum likelihood, and a second score indicating a value of second highest likelihood;
judging device for comparing an evaluation value based on said first score and said second score with a pre-set threshold value, and judging said recognition result as being true when said evaluation value is in a specific relation with respect to said threshold value; and
speaker adaptation processing device for applying speaker adaptation to said acoustic models once said judging device judges said recognition result as being true. - View Dependent Claims (2, 3, 4)
-
-
5. A speech recognition method for applying speaker adaptation to acoustic models based on feature vectors of utterances, said method comprising:
-
a first step of comparing said acoustic models against said feature vector of utterances, and outputting a recognition result specifying a sequence of acoustic models having maximum likelihood, a first score indicating a value of the maximum likelihood, and a second score indicating a value of second highest likelihood;
a second step of comparing an evaluation value based on said first score and said second score with a pre-set threshold value, and judging said recognition result as being true when said evaluation value is in a specific relation with respect to said threshold value; and
a third step of applying speaker adaptation to said acoustic models when said recognition result is judged as being true in said second step. - View Dependent Claims (6, 7, 8)
-
-
9. A speech recognition apparatus, comprising:
-
storage device having initial acoustic models;
noise adapting device for generating noise adapted models by applying noise adaptation to said initial acoustic models in said storage device using background noises at a time of speaker adaptation;
speaker adaptation parameter calculating device for performing speaker adaptation computation with respect to said noise adapted models generated in said noise adapting device, using utterances uttered at the time of said speaker adaptation, and thereby calculating a speaker adaptation parameter for converting said noise adapted models into noise-superimposed speaker adapted models; and
acoustic model updating device for generating speaker adapted models by applying speaker adaptation to said initial acoustic models in said storage device using said speaker adaptation parameter, and replacing said initial acoustic models with said speaker adapted models so as to be updated and newly stored in said storage device. - View Dependent Claims (10)
-
-
11. A speech recognition apparatus, comprising:
-
storage device having initial acoustic models;
noise adapting device for generating noise adapted models by applying noise adaptation to said initial acoustic models in said storage device, using background noises during a silent period at a time of speech recognition;
recognition processing device for performing speech recognition by comparing utterances uttered during an utterance period at the time of said speech recognition and to be subjected to speech recognition, against said noise adapted models generated in said noise adapting device;
speaker adaptation parameter calculating device for performing speaker adaptation computation with respect to said noise adapted models generated in said noise adapting device, using said utterances to be subjected to speech recognition, and thereby calculating a speaker adaptation parameter for converting said noise adapted models into noise-superimposed speaker adapted models; and
acoustic model updating device for generating speaker adapted models by applying speaker adaptation to said initial acoustic models in said storage device using said speaker adaptation parameter, and replacing said initial acoustic models with said speaker adapted models so as to be updated and newly stored in said storage device. - View Dependent Claims (12)
-
-
13. A speech recognition method, comprising:
-
a noise adaptation processing step of generating noise adapted models by applying noise adaptation to initial acoustic models stored in storage device, using background noises at a time of speaker adaptation;
a speaker adaptation parameter calculating step of performing speaker adaptation computation with respect to said noise adapted models generated in said noise adaptation processing step, using utterances uttered at the time of said speaker adaptation, and thereby calculating a speaker adaptation parameter for converting said noise adapted models into noise-superimposed speaker adapted models; and
a acoustic model updating step of generating speaker adapted models by applying speaker adaptation to said initial acoustic models in said storage device using said speaker adaptation parameter, and replacing said initial acoustic models with said speaker adapted models so as to be updated and newly stored in said storage device. - View Dependent Claims (14)
-
-
15. A speech recognition method, comprising:
-
a noise adaptation processing step of generating noise adapted models by applying noise adaptation to initial acoustic models stored in storage device, using background noises during a silent period at a time of speech recognition;
a recognition processing step of performing speech recognition by comparing utterances to be uttered during an utterance period at the time of said speech recognition and to be subjected to speech recognition, against said noise adapted models generated in said noise adaptation processing step;
a speaker adaptation parameter calculating step of performing speaker adaptation computation with respect to said noise adapted models generated in said noise adaptation processing step, using said utterances to be subjected to speech recognition, and thereby calculating a speaker adaptation parameter for converting said noise adapted models into noise-superimposed speaker adapted models; and
a acoustic model update processing step of generating speaker adapted models by applying speaker adaptation to said initial acoustic models in said storage device using said speaker adaptation parameter, and replacing said initial acoustic models with said speaker adapted models so as to be updated and newly stored in said storage device. - View Dependent Claims (16)
-
Specification