Combining frequency warping and spectral shaping in HMM based speech recognition
First Claim
Patent Images
1. A signal processing method for recognizing unknown speech signals, comprising the following steps:
- (A) receiving an unknown speech signal;
(B) generating a frequency warped sequence of feature vectors characterizing the unknown speech signal;
(C) simultaneously adapting a set of recognition unit models to the unknown speech signal using a linear transformation while generating said frequency warped sequence of feature vectors; and
(D) recognizing the unknown speech signal based on said frequency warped sequence of feature vectors and the set of adapted recognition unit models.
4 Assignments
0 Petitions
Accused Products
Abstract
Frequency warping approaches to speaker normalization have been proposed and evaluated on various speech recognition tasks. In all cases, frequency warping was found to significantly improve recognition performance by reducing the mismatch between test utterances presented to the recognizer and the speaker independent HMM model. This invention relates to a procedure which compensates utterances by simultaneously scaling the frequency axis and reshaping the spectral energy contour. This procedure is shown to reduce the error rate in a telephone based connected digit recognition task by 30%.
160 Citations
20 Claims
-
1. A signal processing method for recognizing unknown speech signals, comprising the following steps:
-
(A) receiving an unknown speech signal; (B) generating a frequency warped sequence of feature vectors characterizing the unknown speech signal; (C) simultaneously adapting a set of recognition unit models to the unknown speech signal using a linear transformation while generating said frequency warped sequence of feature vectors; and (D) recognizing the unknown speech signal based on said frequency warped sequence of feature vectors and the set of adapted recognition unit models. - View Dependent Claims (2, 3, 4, 5, 6, 7, 8)
-
-
9. A signal processing method for recognizing unknown speech signals, comprising the following steps:
-
(A) receiving an unknown speech signal; (B) generating a frequency warped sequence of feature vectors characterizing the unknown speech signal; (C) applying a linear transformation to said frequency warped sequence of feature vectors; (D) jointly optimizing generating said frequency warped sequence of feature vectors and applying the linear transformation; and (E) recognizing the unknown speech signal based on the linearly transformed frequency warped sequence of feature vectors and a set of recognition unit models. - View Dependent Claims (10, 11, 12, 13)
-
-
14. A speech recognition system, comprising:
-
an acoustic transducer capable of receiving sound waves representing unknown speech and converting the sound waves into an electrical unknown speech signal; a feature extractor coupled to the acoustic transducer, wherein the feature extractor generating a frequency warped sequence of feature vectors based on the unknown speech signal, the feature vectors of said frequency warped sequence being warped according to a warping factor; a memory means for storing a set of recognition unit models; a model adaptation processor coupled to the feature extractor and the memory means, wherein the model adaptation processor simultaneously adapting the set of recognition unit models to the unknown speech signal using a linear transformation based on said frequency warped sequence of feature vectors while said frequency warped sequence of feature vectors is generated; and a recognizer coupled to the feature extractor and the memory means, wherein the recognizer recognizing the unknown speech signal based on said frequency warped sequence of feature vectors and the set of adapted recognition unit models. - View Dependent Claims (15, 16, 17, 18, 19, 20)
-
Specification