Adaptation of acoustic prototype vectors in a speech recognition system
First Claim
1. A speech recognition system performing a frequency analysis of an input speech for each period to obtain feature vectors, producing the corresponding label train using a vector quantization code book, matching a plurality of word baseforms expressed by a train of Markov models each corresponding to labels, with said label train, and recognizing the input speech on the basis of the matching result, and comprising:
- a means for dividing each of a plurality of word input speeches into N segments (N is an integer number more than
1) and producing a representative value of the feature vector of each segment of each of said word input speeches;
a means for dividing word baseforms each corresponding to said word input speeches and producing a representative value of each segment feature vector of each word baseform on the basis of prototype vectors of said vector quantization code book;
a means for producing displacement vectors indicating the displacements between the representative values of the segments of the word input speeches and the representative values of the corresponding segments of the corresponding word baseforms;
a means for storing the degree of relation between each segment of said each word input speech and each label in a label group of the vector quantization code book; and
a prototype adaptation means for correcting a prototype vector of each label of said vector quantization code book by said each displacement vector in accordance with the degree of relation between the label and the segment.
1 Assignment
0 Petitions
Accused Products
Abstract
In a speech recognition system, the prior parameters of acoustic prototype vectors are adapted to a new speaker to obtain posterior parameters by having the speaker utter a set of adaptation words. The prior parameters of an acoustic prototype vector are adapted by a weighted sum of displacement vectors obtained from the adaptation utterances. Each displacement vector is associated with one segment of an uttered adaptation word. Each displacement vector represents the distance between the associated segment of the adaptation utterance and the model corresponding to that segment. Each displacement vector is weighted by the strength of the relationship of the acoustic prototype vector to the word segment model corresponding to the displacement vector.
195 Citations
9 Claims
-
1. A speech recognition system performing a frequency analysis of an input speech for each period to obtain feature vectors, producing the corresponding label train using a vector quantization code book, matching a plurality of word baseforms expressed by a train of Markov models each corresponding to labels, with said label train, and recognizing the input speech on the basis of the matching result, and comprising:
-
a means for dividing each of a plurality of word input speeches into N segments (N is an integer number more than
1) and producing a representative value of the feature vector of each segment of each of said word input speeches;a means for dividing word baseforms each corresponding to said word input speeches and producing a representative value of each segment feature vector of each word baseform on the basis of prototype vectors of said vector quantization code book; a means for producing displacement vectors indicating the displacements between the representative values of the segments of the word input speeches and the representative values of the corresponding segments of the corresponding word baseforms; a means for storing the degree of relation between each segment of said each word input speech and each label in a label group of the vector quantization code book; and a prototype adaptation means for correcting a prototype vector of each label of said vector quantization code book by said each displacement vector in accordance with the degree of relation between the label and the segment. - View Dependent Claims (2, 3, 4, 5)
-
-
6. A speech recognition system performing a frequency analysis of an input speech for each period to obtain feature vectors, producing the corresponding label train using a vector quantization code book, matching a plurality of word baseforms expressed by a train of Markov models each corresponding to labels, with said label train and recognizing the input speech on the basis of the matching result, comprising:
-
a means for producing a representative value of feature vectors in each of a plurality of word input speeches; a means for producing a representative value of feature vectors in the word baseform corresponding to said word input speech, based upon prototype vectors of said vector quantization code book; a means for producing a displacement vector indicating the displacement between the representative value of each word input speech and the representative value of the corresponding word baseform; a means for storing the degree of relation between said each word input speech and each label in the vector quantization code book; and a prototype adaptation means for correcting a prototype vector of each label in the label group of said vector quantization code book by said each displacement vector in accordance with the degree of relation between the label and the word input speech.
-
-
7. A speaker-adaptable speech recognition apparatus comprising:
-
means for measuring the value of at least one feature of an utterance, said utterance occurring over a series of successive time intervals of equal duration, said means measuring the feature value of the utterance during each time interval to produce a series of feature vector signals representing the feature values; means for storing a finite set of prototype vector signals, each prototype vector signal having at least one parameter having a prior value; means for comparing the feature value of each feature vector signal, in the series of feature vector signals produced by the measuring means as a result of the utterance, to the prior parameter values of the prototype vector signals to determine, for each feature vector signal, the closest associated prototype vector signal, to produce an utterance-based series of prototype vector signals; means for generating a correlation signal having a value proportional to the correlation between the utterance-based series of prototype vector signals and a first prototype signal; means for modeling the utterance with a model-based series of prototype vector signals; means for calculating a displacement vector signal having a value representing the distance between the series of feature vector signals and the series of model-based prototype vector signals; and means for providing the first prototype signal with a posterior parameter value equal to its prior parameter value plus an offset proportional to the product of the value of the displacement vector signal multiplied by the value of the correlation signal. - View Dependent Claims (8, 9)
-
Specification