Unsupervised incremental adaptation using maximum likelihood spectral transformation

US 6,999,926 B2
Filed: 07/23/2001
Issued: 02/14/2006
Est. Priority Date: 11/16/2000
Status: Expired due to Fees

First Claim

Patent Images

1. A method of adapting a speech recognition system to speech data provided to the speech recognition system, the method comprising the steps of:

computing alignment information between the speech recognition system and feature vectors associated with the speech data provided to the speech recognition system;

computing an original spectra for each feature vector and corresponding mean vector;

estimating one or more transformation parameters which maximize a likelihood of an utterance; and

transforming a current feature vector using the estimated transformation parameters and maximum likelihood criteria, the transformation being performed in a linear spectral domain;

wherein the step of estimating the transformation parameters further comprises the step of estimating convolutional noise N_i^α and additive noise N_i^β for each ith component of a speech vector corresponding to the speech data provided to the speech recognition system.

View all claims

2 Assignments

Timeline View

Assignment View

0 Petitions

Accused Products

Abstract

A maximum likelihood spectral transformation (MLST) technique is proposed for rapid speech recognition under mismatched training and testing conditions. Speech feature vectors of real-time utterances are transformed in a linear spectral domain such that a likelihood of the utterances is increased after the transformation. Cepstral vectors are computed from the transformed spectra. The MLST function used for the spectral transformation is configured to handle both convolutional and additive noise. Since the function has small number of parameters to be estimated, only a few utterances are required for accurate adaptation, thus essentially eliminating the need for training speech data. Furthermore, the computation for parameter estimation and spectral transformation can be done efficiently in linear time. Therefore, the techniques of the present invention are well-suited for rapid online adaptation.

Citations

17 Claims

1. A method of adapting a speech recognition system to speech data provided to the speech recognition system, the method comprising the steps of:
- computing alignment information between the speech recognition system and feature vectors associated with the speech data provided to the speech recognition system;
  
  computing an original spectra for each feature vector and corresponding mean vector;
  
  estimating one or more transformation parameters which maximize a likelihood of an utterance; and
  
  transforming a current feature vector using the estimated transformation parameters and maximum likelihood criteria, the transformation being performed in a linear spectral domain;
  
  wherein the step of estimating the transformation parameters further comprises the step of estimating convolutional noise N_i^α and additive noise N_i^β for each ith component of a speech vector corresponding to the speech data provided to the speech recognition system.
- View Dependent Claims (2, 3, 4, 5, 6, 7, 8)
- - 2. The method of claim 1, wherein the step of transforming the current feature vector is performed in feature space.
  - 3. The method of claim 1, wherein the step of transforming the current feature vector is performed in model space.
  - 4. The method of claim 1, wherein the maximum likelihood criteria is a maximum likelihood spectral transformation (MLST).
  - 5. The method of claim 1, wherein the step of estimating one or more transformation parameters which maximize a likelihood of an utterance further comprises the step of computing likelihood of utterance information corresponding to a previous feature vector transformation.
  - 6. The method of claim 1, wherein the step of computing alignment information is performed using a Baum-Welch algorithm.
  - 7. The method of claim 1, wherein the step of estimating the transformation parameters further comprises the step of defining a diagonal matrix A with A_ii=1/N_i^α
    - , and defining b_i=−
      
      N_i^β/N_i^α.
  - 8. The method of claim 7, further comprising the steps of:
    - determining A_iiin accordance with an expression $A_{ii} = \frac{T \sum_{t} x_{t, i}^{(ɛ)} m_{t, i}^{(ɛ)} - \sum_{t} x_{t, i}^{(ɛ)} \sum_{t} m_{t, i}^{(ɛ)}}{T \sum_{t} x_{t, i}^{(ɛ) 2} - \sum_{t} x_{t, i}^{(ɛ)} \sum_{t} x_{t, i}^{(ɛ)}}; and$ anddetermining b_iin accordance with an expression $b_{i} = \frac{- A_{ii} \sum_{t} x_{t, i}^{(ɛ)} + \sum_{t} m_{t, i}^{(ɛ)}}{T};$ where x_t,i^(ε
      
      )and m_t,i^(ε
      
      )are sub-linear spectral values of a feature vector and corresponding mean vector, respectively, for each ith component of the speech vector.

9. A method of adapting a speech recognition system to speech data provided to the speech recognition system, the method comprising the steps of:
- computing alignment information between the speech recognition system and feature vectors associated with the speech data provided to the speech recognition system;
  
  computing an original spectra for each feature vector and corresponding mean vector;
  
  estimating one or more transformation parameters which maximize a likelihood of an utterance; and
  
  transforming a current feature vector using the estimated transformation parameters and maximum likelihood criteria, the transformation being performed in a linear spectral domain;
  
  wherein the step of transforming the current feature vector further comprises the step of determining ${\dot{x}}_{i}^{(f)} = \frac{1}{N_{i}^{a}} x_{i}^{(f)} - \frac{N_{i}^{β}}{N_{i}^{a}},$ where x_i^(f)is an ith component of a speech vector corresponding to the speech data provided to the speech recognition system, N_i^α is convolutional noise and N_i^β is additive noise of the ith component of the speech vector.

10. Apparatus for adapting a speech recognition system to speech data provided to the speech recognition system, the apparatus comprising:
- at least one processing device operative to;
  
  (i) compute alignment information between the speech recognition system and feature vectors associated with the speech data provided to the speech recognition system;
  
  (ii) compute an original spectra for each feature vector and a corresponding mean vector;
  
  (iii) estimate one or more transformation parameters which maximize a likelihood of an utterance; and
  
  (iv) transform a current feature vector based on at least one of maximum likelihood criteria and the estimated transformation parameters, the transformation being performed in a linear spectral domain;
  
  wherein the operation of transforming the current feature vector includes the step of determining ${\dot{x}}_{i}^{(f)} = \frac{1}{N_{i}^{a}} x_{i}^{(f)} - \frac{N_{i}^{β}}{N_{i}^{a}},$ where x_i^(f)is an ith component of a speech vector corresponding to the speech data provided to the speech recognition system, N_i^α is convolutional noise and N_i^β is additive noise of the ith component of the speech vector.

11. Apparatus for adapting a speech recognition system to speech data provided to the speech recognition system, the apparatus comprising:
- at least one processing device operative to;
  
  (i) compute alignment information between the speech recognition system and feature vectors associated with the speech data provided to the speech recognition system;
  
  (ii) compute an original spectra for each feature vector and a corresponding mean vector;
  
  (iii) estimate one or more transformation parameters which maximize a likelihood of an utterance; and
  
  (iv) transform a current feature vector based on at least one of maximum likelihood criteria and the estimated transformation parameters, the transformation being performed in a linear spectral domain;
  
  wherein the operation of estimating the transformation parameters further includes the operation of estimating convolutional noise N_i^α and additive noise N_i^β for each ith component of a speech vector provided to the speech recognition system.
- View Dependent Claims (12, 13, 14, 15, 16, 17)
- - 12. The apparatus of claim 11, wherein the operation of transforming the current feature vector is performed in a feature space.
  - 13. The apparatus of claim 11, wherein the operation of transforming the current feature vector is performed in a model space.
  - 14. The apparatus of claim 11, wherein the spectral transformation employed in the operation of transforming the current feature vector is a maximum likelihood spectral transformation (MLST).
  - 15. The apparatus of claim 11, wherein the operation of estimating one or more transformation parameters which maximize a likelihood of an utterance further comprises the operation of computing likelihood of utterance information corresponding to a previous feature vector transformation.
  - 16. The apparatus of claim 11, wherein the operation of estimating the transformation parameters further includes the operation of defining a diagonal matrix A with A_ii=1/N_i^α
    - , and defining b_i=−
      
      N_i^β/N_i^α.
  - 17. The apparatus of claim 16, wherein the operation of estimating the transformation parameters further comprises the operation of:
    - determining A_iiin accordance with an expression $A_{ii} = \frac{T \sum_{t} x_{t, i}^{(ɛ)} m_{t, i}^{(ɛ)} - \sum_{t} x_{t, i}^{(ɛ)} \sum_{t} m_{t, i}^{(ɛ)}}{T \sum_{t} x_{t, i}^{(ɛ) 2} - \sum_{t} x_{t, i}^{(ɛ)} \sum_{t} x_{t, i}^{(ɛ)}}; and$ anddetermining b_iin accordance with an expression $b_{i} = \frac{- A_{ii} \sum_{t} x_{t, i}^{(ɛ)} + \sum_{t} m_{t, i}^{(ɛ)}}{T};$ where x_t,i^(ε
      
      )and m_t,i^(ε
      
      )are sub-linear spectral values of a feature vector and corresponding mean vector, respectively, for each ith component of the speech vector.

Specification

Resources

Litigation Campaign Assessment

Current Assignee
Nuance Communications, Inc. (Microsoft Corporation)
Original Assignee
International Business Machines Corporation
Inventors
Lubensky, David M., Yuk, Dongsuk
Primary Examiner(s)
McFadden, Susan
Assistant Examiner(s)
Opsasnick, Michael N.

Application Number

US09/910,985
Publication Number

US 20020091521A1
Time in Patent Office

1,667 Days
Field of Search

704/245, 704/243, 704/222
US Class Current

704/244
CPC Class Codes

G10L 15/065   Adaptation

G10L 15/20   Speech recognition techniqu...

G10L 21/0216   characterised by the method...

Unsupervised incremental adaptation using maximum likelihood spectral transformation

First Claim

2 Assignments

0 Petitions

Accused Products

Abstract

Citations

17 Claims

Specification

Solutions

Use Cases

Quick Links

Unsupervised incremental adaptation using maximum likelihood spectral transformation

First Claim

2 Assignments

Subscription Required

Subscription Required

0 Petitions

Subscription Required

Accused Products

Subscription Required

Abstract

Citations

17 Claims

Specification

Subscription Required

Solutions

Use Cases

Quick Links