Unsupervised incremental adaptation using maximum likelihood spectral transformation
First Claim
1. A method of adapting a speech recognition system to speech data provided to the speech recognition system, the method comprising the steps of:
- computing alignment information between the speech recognition system and feature vectors associated with the speech data provided to the speech recognition system;
computing an original spectra for each feature vector and corresponding mean vector;
estimating one or more transformation parameters which maximize a likelihood of an utterance; and
transforming a current feature vector using the estimated transformation parameters and maximum likelihood criteria, the transformation being performed in a linear spectral domain;
wherein the step of estimating the transformation parameters further comprises the step of estimating convolutional noise Niα
and additive noise Niβ
for each ith component of a speech vector corresponding to the speech data provided to the speech recognition system.
2 Assignments
0 Petitions
Accused Products
Abstract
A maximum likelihood spectral transformation (MLST) technique is proposed for rapid speech recognition under mismatched training and testing conditions. Speech feature vectors of real-time utterances are transformed in a linear spectral domain such that a likelihood of the utterances is increased after the transformation. Cepstral vectors are computed from the transformed spectra. The MLST function used for the spectral transformation is configured to handle both convolutional and additive noise. Since the function has small number of parameters to be estimated, only a few utterances are required for accurate adaptation, thus essentially eliminating the need for training speech data. Furthermore, the computation for parameter estimation and spectral transformation can be done efficiently in linear time. Therefore, the techniques of the present invention are well-suited for rapid online adaptation.
-
Citations
17 Claims
-
1. A method of adapting a speech recognition system to speech data provided to the speech recognition system, the method comprising the steps of:
-
computing alignment information between the speech recognition system and feature vectors associated with the speech data provided to the speech recognition system; computing an original spectra for each feature vector and corresponding mean vector; estimating one or more transformation parameters which maximize a likelihood of an utterance; and transforming a current feature vector using the estimated transformation parameters and maximum likelihood criteria, the transformation being performed in a linear spectral domain; wherein the step of estimating the transformation parameters further comprises the step of estimating convolutional noise Niα
and additive noise Niβ
for each ith component of a speech vector corresponding to the speech data provided to the speech recognition system. - View Dependent Claims (2, 3, 4, 5, 6, 7, 8)
-
-
9. A method of adapting a speech recognition system to speech data provided to the speech recognition system, the method comprising the steps of:
-
computing alignment information between the speech recognition system and feature vectors associated with the speech data provided to the speech recognition system; computing an original spectra for each feature vector and corresponding mean vector; estimating one or more transformation parameters which maximize a likelihood of an utterance; and transforming a current feature vector using the estimated transformation parameters and maximum likelihood criteria, the transformation being performed in a linear spectral domain; wherein the step of transforming the current feature vector further comprises the step of determining where xi(f) is an ith component of a speech vector corresponding to the speech data provided to the speech recognition system, Niα
is convolutional noise and Niβ
is additive noise of the ith component of the speech vector.
-
-
10. Apparatus for adapting a speech recognition system to speech data provided to the speech recognition system, the apparatus comprising:
-
at least one processing device operative to;
(i) compute alignment information between the speech recognition system and feature vectors associated with the speech data provided to the speech recognition system;
(ii) compute an original spectra for each feature vector and a corresponding mean vector;
(iii) estimate one or more transformation parameters which maximize a likelihood of an utterance; and
(iv) transform a current feature vector based on at least one of maximum likelihood criteria and the estimated transformation parameters, the transformation being performed in a linear spectral domain;wherein the operation of transforming the current feature vector includes the step of determining where xi(f) is an ith component of a speech vector corresponding to the speech data provided to the speech recognition system, Niα
is convolutional noise and Niβ
is additive noise of the ith component of the speech vector.
-
-
11. Apparatus for adapting a speech recognition system to speech data provided to the speech recognition system, the apparatus comprising:
-
at least one processing device operative to;
(i) compute alignment information between the speech recognition system and feature vectors associated with the speech data provided to the speech recognition system;
(ii) compute an original spectra for each feature vector and a corresponding mean vector;
(iii) estimate one or more transformation parameters which maximize a likelihood of an utterance; and
(iv) transform a current feature vector based on at least one of maximum likelihood criteria and the estimated transformation parameters, the transformation being performed in a linear spectral domain;wherein the operation of estimating the transformation parameters further includes the operation of estimating convolutional noise Niα
and additive noise Niβ
for each ith component of a speech vector provided to the speech recognition system. - View Dependent Claims (12, 13, 14, 15, 16, 17)
-
Specification