Apparatus and method for speech recognition
First Claim
1. A speech recognition apparatus for recognizing speech by comparing composite acoustic models adapted to noise and speaker with a feature vector series extracted from an uttered speech, comprising:
- a storing section for previously storing each representative acoustic model selected as a representative of acoustic models belonging to one of groups, each of said groups being formed beforehand by classifying a large number of acoustic models on a basis of a similarity, difference models of each group obtained from difference between said acoustic models belonging to one of said groups and said representative acoustic model of said identical group, and group information for corresponding said representative acoustic models with said difference models every said identical group, a generating section for generating each noise adaptive representative acoustic model of said each group by noise-adaptation executed to said representative acoustic model of said each group stored in said storing section;
a generating section for generating each composite acoustic model of said each group by composition of said difference model and said noise adaptive representative acoustic model using said group information;
a renewal model generating section for generating noise and speaker adaptive acoustic models by performing a speaker-adaptation of said composite acoustic model every identical group, using the feature vector series obtained from the uttered speech; and
a model renewal section for replacing said difference models of said each group by renewal difference models which are generated by taking differences between said noise and speaker adaptive acoustic models and said noise adaptive representative acoustic models selected via said group information;
wherein a speech recognition is performed by comparing the feature vector series extracted from the uttered speech to be recognized with said composite acoustic model adapted to noise and speaker, and wherein said composite acoustic model adapted to noise and speaker is generated by composition of said renewal difference model and said noise adaptive representative acoustic model, which is generated by a noise-adaptation of said representative acoustic model of said group including said renewal difference model selected via said group information.
1 Assignment
0 Petitions
Accused Products
Abstract
Before executing a speech recognition, a composite acoustic model adapted to noise is generated by composition of a noise adaptive representative acoustic model generated by noise-adaptation of each representative acoustic model and difference models stored in advance in a storing section, respectively. Then, the noise and speaker adaptive acoustic model is generated by executing speaker-adaptation to the composite acoustic model with the feature vector series of uttered speech. The renewal difference model is generated by the difference between the noise and speaker adaptive acoustic model and the noise adaptive representative acoustic model, to replace the difference model stored in the storing section therewith. The speech recognition is performed by comparing the feature vector series of the uttered speech to be recognized with the composite acoustic model adapted to noise and speaker generated by the composition of the noise adaptive representative acoustic model and the renewal difference model.
-
Citations
6 Claims
-
1. A speech recognition apparatus for recognizing speech by comparing composite acoustic models adapted to noise and speaker with a feature vector series extracted from an uttered speech, comprising:
-
a storing section for previously storing each representative acoustic model selected as a representative of acoustic models belonging to one of groups, each of said groups being formed beforehand by classifying a large number of acoustic models on a basis of a similarity, difference models of each group obtained from difference between said acoustic models belonging to one of said groups and said representative acoustic model of said identical group, and group information for corresponding said representative acoustic models with said difference models every said identical group, a generating section for generating each noise adaptive representative acoustic model of said each group by noise-adaptation executed to said representative acoustic model of said each group stored in said storing section;
a generating section for generating each composite acoustic model of said each group by composition of said difference model and said noise adaptive representative acoustic model using said group information;
a renewal model generating section for generating noise and speaker adaptive acoustic models by performing a speaker-adaptation of said composite acoustic model every identical group, using the feature vector series obtained from the uttered speech; and
a model renewal section for replacing said difference models of said each group by renewal difference models which are generated by taking differences between said noise and speaker adaptive acoustic models and said noise adaptive representative acoustic models selected via said group information;
wherein a speech recognition is performed by comparing the feature vector series extracted from the uttered speech to be recognized with said composite acoustic model adapted to noise and speaker, and wherein said composite acoustic model adapted to noise and speaker is generated by composition of said renewal difference model and said noise adaptive representative acoustic model, which is generated by a noise-adaptation of said representative acoustic model of said group including said renewal difference model selected via said group information. - View Dependent Claims (3)
-
-
2. A speech recognition apparatus for recognizing speech by comparing composite acoustic models adapted to noise and speaker with a feature vector series extracted from an uttered speech, comprising:
-
a storing section for previously storing each representative acoustic model selected as a representative of acoustic models belonging to one of groups, each of said groups being formed beforehand by classifying a large number of acoustic models on a basis of a similarity, difference models of each group obtained from difference between said acoustic models belonging to one of said groups and said representative acoustic model of said identical group, and group information for corresponding said representative acoustic models with said difference models every said identical group, a generating section for generating each noise adaptive representative acoustic model of said each group by noise-adaptation executed to said representative acoustic model of said each group stored in said storing section;
a generating section for generating each composite acoustic model of said each group by composition of said difference model and said noise adaptive representative acoustic model using said group information;
a recognition processing section for recognizing speech by comparing said composite acoustic models generated in said generating section for composite acoustic models with the feature vector series extracted from the uttered speech to be recognized;
a renewal model generating section for generating noise and speaker adaptive acoustic models by performing a speaker-adaptation of said composite acoustic model every identical group, using the feature vector series obtained from the uttered speech; and
a model renewal section for replacing said difference models of said each group by renewal difference models which are generated by taking differences between said noise and speaker adaptive acoustic models and said noise adaptive representative acoustic models selected via said group information;
wherein said recognition processing section performs a speech recognition by comparing the feature vector series extracted from the uttered speech to be recognized with said composite acoustic model adapted to noise and speaker generated by composition of said noise adaptive representative acoustic model generated by noise-adaptation of said representative acoustic model of each group including said renewal difference model selected with said group information and said renewal difference model renewed by said renewal model generating section and said model renewal section, every repetition of the speech recognition.
-
-
4. A speech recognition method for recognizing speech by comparing a set of composite acoustic models adapted to noise and speaker with a feature vector series extracted from an uttered speech, comprising the steps of:
-
previously storing, in a storing section, each representative acoustic model selected as a representative of acoustic models belonging to one of groups, each of said groups being formed beforehand by classing a large number of acoustic models on a basis of a similarity, difference models of each group obtained from difference between said acoustic models belonging to one of said groups and said representative acoustic model of said identical group, and group information for corresponding said representative acoustic models with said difference models every said identical group;
generating each noise adaptive representative acoustic model of said each group by noise-adaptation executed to said representative acoustic model of said each group stored in the storing section;
generating each composite acoustic model of said each group by composition of said difference model and said noise adaptive representative acoustic model using said group information;
generating noise and speaker adaptive acoustic models by performing a speaker-adaptation of said composite acoustic model every identical group, using the feature vector series obtained from the uttered speech; and
replacing said stored difference models of said each group by renewal difference models which are generated by taking differences between said noise and speaker adaptive acoustic models and said noise adaptive representative acoustic models selected via said group information;
wherein said speech recognition is performed by comparing the feature vector series extracted from the uttered speech to be recognized with said composite acoustic model adapted to noise and speaker, and wherein said composite acoustic model adapted to noise and speaker is generated by composition of said renewal difference model and said noise adaptive representative acoustic model, which is generated by a noise-adaptation of said representative acoustic model of said group including said renewal difference model selected via said group information. - View Dependent Claims (6)
-
-
5. A speech recognition method for recognizing speech by comparing a set of composite acoustic models adapted to noise and speaker with a feature vector series extracted from an uttered speech, comprising the steps of:
-
previously storing, in a storing section, each representative acoustic model selected as a representative of acoustic models belonging to one of groups, each of said groups being formed beforehand by classing a large number of acoustic models on a basis of a similarity, difference models of each group obtained from difference between said acoustic models belonging to one of said groups and said representative acoustic model of said identical group, and group information for corresponding said representative acoustic models with said difference models every said identical group, generating each noise adaptive representative acoustic model of said each group by noise-adaptation executed to said representative acoustic model of said each group stored in the storing section;
generating each composite acoustic model of said each group by composition of said difference model and said noise adaptive representative acoustic model using said group information;
recognizing a speech by comparing said composite acoustic models generated in said generating step for composite acoustic models with the feature vector series extracted from the uttered speech to be recognized;
generating noise and speaker adaptive acoustic models by performing a speaker-adaptation of said composite acoustic model every identical group, using the feature vector series obtained from the uttered speech; and
replacing said stored difference models of said each group by renewal difference models which are generated by taking differences between said noise and speaker adaptive acoustic models and said noise adaptive representative acoustic models selected via said group information;
wherein said recognition processing step performs a speech recognition by comparing the feature vector series extracted from the uttered speech to be recognized with said composite acoustic model adapted to noise and speaker generated by composition of said noise adaptive representative acoustic model generated by noise-adaptation of said representative acoustic model of each group including said renewal difference model selected with said group information and said renewal difference model renewed by said noise and speaker adaptive acoustic models generating step and said difference models replacing step, every repetition of the speech recognition.
-
Specification