Adaptation of acoustic prototype vectors in a speech recognition system

US 5,046,099 A
Filed: 02/27/1990
Issued: 09/03/1991
Est. Priority Date: 03/13/1989
Status: Expired due to Fees

First Claim

Patent Images

1. A speech recognition system performing a frequency analysis of an input speech for each period to obtain feature vectors, producing the corresponding label train using a vector quantization code book, matching a plurality of word baseforms expressed by a train of Markov models each corresponding to labels, with said label train, and recognizing the input speech on the basis of the matching result, and comprising:

a means for dividing each of a plurality of word input speeches into N segments (N is an integer number more than

1) and producing a representative value of the feature vector of each segment of each of said word input speeches;

a means for dividing word baseforms each corresponding to said word input speeches and producing a representative value of each segment feature vector of each word baseform on the basis of prototype vectors of said vector quantization code book;

a means for producing displacement vectors indicating the displacements between the representative values of the segments of the word input speeches and the representative values of the corresponding segments of the corresponding word baseforms;

a means for storing the degree of relation between each segment of said each word input speech and each label in a label group of the vector quantization code book; and

a prototype adaptation means for correcting a prototype vector of each label of said vector quantization code book by said each displacement vector in accordance with the degree of relation between the label and the segment.

View all claims

1 Assignment

Timeline View

Assignment View

0 Petitions

Accused Products

Abstract

In a speech recognition system, the prior parameters of acoustic prototype vectors are adapted to a new speaker to obtain posterior parameters by having the speaker utter a set of adaptation words. The prior parameters of an acoustic prototype vector are adapted by a weighted sum of displacement vectors obtained from the adaptation utterances. Each displacement vector is associated with one segment of an uttered adaptation word. Each displacement vector represents the distance between the associated segment of the adaptation utterance and the model corresponding to that segment. Each displacement vector is weighted by the strength of the relationship of the acoustic prototype vector to the word segment model corresponding to the displacement vector.

195 Citations

9 Claims

1. A speech recognition system performing a frequency analysis of an input speech for each period to obtain feature vectors, producing the corresponding label train using a vector quantization code book, matching a plurality of word baseforms expressed by a train of Markov models each corresponding to labels, with said label train, and recognizing the input speech on the basis of the matching result, and comprising:
- a means for dividing each of a plurality of word input speeches into N segments (N is an integer number more than
  
  1) and producing a representative value of the feature vector of each segment of each of said word input speeches;
  
  a means for dividing word baseforms each corresponding to said word input speeches and producing a representative value of each segment feature vector of each word baseform on the basis of prototype vectors of said vector quantization code book;
  
  a means for producing displacement vectors indicating the displacements between the representative values of the segments of the word input speeches and the representative values of the corresponding segments of the corresponding word baseforms;
  
  a means for storing the degree of relation between each segment of said each word input speech and each label in a label group of the vector quantization code book; and
  
  a prototype adaptation means for correcting a prototype vector of each label of said vector quantization code book by said each displacement vector in accordance with the degree of relation between the label and the segment.
- View Dependent Claims (2, 3, 4, 5)
- - 2. A speech recognition system according to claim 1, wherein the representative value of each segment feature vector of said each word input speech is an average value of the feature vectors in the segment.
  - 3. A speech recognition system according to claim 2, wherein the representative value of each segment feature vector of said each word baseform is an average value of the prototype vectors of the labels in the segment.
  - 4. A speech recognition system according to claim 3, wherein the degree of relation between each segment of said each word input speech and each label in the label group of the vector quantization code book is proportional to the probability ##EQU3## where P(L_k |i,j) is the degree of relation between the segment j of the word input speech for the word i and the label L_k in the vector quantization code book, P(L_k |M_l) is the output probability of the label L_k in Markov model M_l, and P(M_l |i,j) is the probability of occurrence of Markov model M_l given the observation of the segment j of the word i.
  - 5. A speech recognition system according to claim 4, wherein in said prototype adaptation means each label prototype vector in the label group of said vector quantization code book is given by ##EQU4## where F_k is a prototype vector for label L_k before correction, F_k '"'"' is a prototype vector for the label L_k after correction, S_ij is a representative value of the feature vector in the segment j of the word input speech for the word i, and B_ij is representative vector in the segment j of the word baseform for the word i.

6. A speech recognition system performing a frequency analysis of an input speech for each period to obtain feature vectors, producing the corresponding label train using a vector quantization code book, matching a plurality of word baseforms expressed by a train of Markov models each corresponding to labels, with said label train and recognizing the input speech on the basis of the matching result, comprising:
- a means for producing a representative value of feature vectors in each of a plurality of word input speeches;
  
  a means for producing a representative value of feature vectors in the word baseform corresponding to said word input speech, based upon prototype vectors of said vector quantization code book;
  
  a means for producing a displacement vector indicating the displacement between the representative value of each word input speech and the representative value of the corresponding word baseform;
  
  a means for storing the degree of relation between said each word input speech and each label in the vector quantization code book; and
  
  a prototype adaptation means for correcting a prototype vector of each label in the label group of said vector quantization code book by said each displacement vector in accordance with the degree of relation between the label and the word input speech.

7. A speaker-adaptable speech recognition apparatus comprising:
- means for measuring the value of at least one feature of an utterance, said utterance occurring over a series of successive time intervals of equal duration, said means measuring the feature value of the utterance during each time interval to produce a series of feature vector signals representing the feature values;
  
  means for storing a finite set of prototype vector signals, each prototype vector signal having at least one parameter having a prior value;
  
  means for comparing the feature value of each feature vector signal, in the series of feature vector signals produced by the measuring means as a result of the utterance, to the prior parameter values of the prototype vector signals to determine, for each feature vector signal, the closest associated prototype vector signal, to produce an utterance-based series of prototype vector signals;
  
  means for generating a correlation signal having a value proportional to the correlation between the utterance-based series of prototype vector signals and a first prototype signal;
  
  means for modeling the utterance with a model-based series of prototype vector signals;
  
  means for calculating a displacement vector signal having a value representing the distance between the series of feature vector signals and the series of model-based prototype vector signals; and
  
  means for providing the first prototype signal with a posterior parameter value equal to its prior parameter value plus an offset proportional to the product of the value of the displacement vector signal multiplied by the value of the correlation signal.
- View Dependent Claims (8, 9)
- - 8. An apparatus as claimed in claim 7, characterized in that the displacement vector signal has a value representing the distance between an average of the series of feature vector signals and an average of the series of model-based prototype vector signals.
  - 9. An apparatus as claimed in claim 8, characterized in that the correlation signal has a value proportional to the probability of observing the utterance-based series of prototype vector signals, given the occurrence of an utterance having feature values closest to the first prototype signal.

Specification

Resources

Litigation Campaign Assessment

Current Assignee
International Business Machines Corporation
Original Assignee
International Business Machines Corporation
Inventors
Nishimura, Masafumi
Primary Examiner(s)
Kemeny, Emanuel S.

Application Number

US07/485,402
Time in Patent Office

553 Days
Field of Search

381/43, 364/513.5
US Class Current

704/256.4
CPC Class Codes

G10L 15/07 to the speaker

G10L 15/144 Training of HMMs

Adaptation of acoustic prototype vectors in a speech recognition system

First Claim

1 Assignment

0 Petitions

Accused Products

Abstract

195 Citations

9 Claims

Specification

Solutions

Use Cases

Quick Links

Adaptation of acoustic prototype vectors in a speech recognition system

First Claim

1 Assignment

Subscription Required

Subscription Required

0 Petitions

Subscription Required

Accused Products

Subscription Required

Abstract

195 Citations

9 Claims

Specification

Subscription Required

Solutions

Use Cases

Quick Links