Unsupervised incremental adaptation using maximum likelihood spectral transformation

US 7,269,555 B2
Filed: 08/30/2005
Issued: 09/11/2007
Est. Priority Date: 11/16/2000
Status: Expired due to Term

First Claim

Patent Images

1. A method of transforming speech feature vectors associated with speech data provided to a speech recognition system, the method comprising the steps of:

receiving likelihood of utterance information corresponding to a previous feature vector transformation;

estimating one or more transformation parameters as a function of the likelihood of utterance information corresponding to a previous feature vector transformation; and

transforming a current feature vector based on at least one of maximum likelihood criteria and the estimated one or more transformation parameters, the transformation being performed in a linear spectral domain;

wherein the step of estimating the one or more transformation parameters comprises the step of estimating convolutional noise N_i^α and additive noise N_i^β for each ith component of a speech vector corresponding to the speech data provided to the speech recognition system.

View all claims

1 Assignment

Timeline View

Assignment View

0 Petitions

Accused Products

Abstract

In a speech recognition system, a method of transforming speech feature vectors associated with speech data provided to the speech recognition system includes the steps of receiving likelihood of utterance information corresponding to a previous feature vector transformation, estimating one or more transformation parameters based, at least in part, on the likelihood of utterance information corresponding to a previous feature vector transformation, and transforming a current feature vector based on maximum likelihood criteria and/or the estimated transformation parameters, the transformation being performed in a linear spectral domain. The step of estimating the one or more transformation parameters includes the step of estimating convolutional noise N_i^α and additive noise N_i^β for each ith component of a speech vector corresponding to the speech data provided to the speech recognition system.

25 Citations

View as Search Results

15 Claims

1. A method of transforming speech feature vectors associated with speech data provided to a speech recognition system, the method comprising the steps of:
- receiving likelihood of utterance information corresponding to a previous feature vector transformation;
  
  estimating one or more transformation parameters as a function of the likelihood of utterance information corresponding to a previous feature vector transformation; and
  
  transforming a current feature vector based on at least one of maximum likelihood criteria and the estimated one or more transformation parameters, the transformation being performed in a linear spectral domain;
  
  wherein the step of estimating the one or more transformation parameters comprises the step of estimating convolutional noise N_i^α and additive noise N_i^β for each ith component of a speech vector corresponding to the speech data provided to the speech recognition system.
- View Dependent Claims (2, 3, 4, 5, 6, 7)
- - 2. The method of claim 1, wherein the step of transforming the current feature vector is performed in feature space.
  - 3. The method of claim 1, wherein the step of transforming the current feature vector is performed in model space.
  - 4. The method of claim 1, wherein the maximum likelihood criteria is a maximum likelihood spectral transformation (MLST).
  - 5. The method of claim 1, wherein the step of transforming the current feature vector further comprises the step of determining ${\dot{x}}_{i}^{(f)} = \frac{1}{N_{i}^{α}}$
    - ⁢
      
      x i ( f ) - N i β
      
      N i α
      
      , where x_i^(f)is an ith component of a speech vector corresponding to the speech data provided to the speech recognition system, N_i^α is convolutional noise and N_i^β is additive noise of the ith component of the speech vector.
  - 6. The method of claim 1, wherein the step of estimating the one or more transformation parameters further comprises the step of defining a diagonal matrix A with A_ii=1/N_i^α
    - , and defining b_i=−
      
      N_i^β/N_i^α.
  - 7. The method of claim 6, further comprising the steps of:
    - determining A_iiin accordance with an expression $A_{ii} = \frac{T Σ_{t} x_{t, i}^{(ɛ)} m_{t, i}^{(ɛ)} - Σ_{t} x_{t, i}^{(ɛ)} Σ_{t} m_{t, i}^{(ɛ)}}{T Σ_{t} x_{t, i}^{(ɛ) 2} - Σ_{t} x_{t, i}^{(ɛ)} Σ_{t} x_{t, i}^{(ɛ)}};$
      
      and determining b_iin accordance with an expression $b_{i} = \frac{- A_{ii} Σ_{t} x_{t, i}^{(ɛ)} + Σ_{t} m_{t, i}^{(ɛ)}}{T};$ where x_t,i^(ε
      
      )and m_t,i^(ε
      
      )are sub-linear spectral values of a feature vector and corresponding mean vector, respectively, for each ith component of the speech vector.

8. An article of manufacture for transforming speech feature vectors associated with speech data provided to a speech recognition system, comprising a machine readable medium containing one or more programs which when executed implement the steps of:
- receiving likelihood of utterance information corresponding to a previous feature vector transformation;
  
  estimating one or more transformation parameters as a function of the likelihood of utterance information corresponding to a previous feature vector transformation; and
  
  transforming a current feature vector based on at least one of maximum likelihood criteria and the estimated transformation parameters, the transformation being performed in a linear spectral domain;
  
  wherein the step of estimating the one or more transformation parameters comprises the step of estimating convolutional noise N_i^α and additive noise N_i^β for each ith component of a speech vector corresponding to the speech data provided to the speech recognition system.
- View Dependent Claims (9, 10, 11, 12, 13, 14)
- - 9. The article of claim 8, wherein the step of transforming the current feature vector is performed in a feature space.
  - 10. The article of claim 8, wherein the step of transforming the current feature vector is performed in a model space.
  - 11. The article of claim 8, wherein the maximum likelihood criteria is a maximum likelihood spectral transformation (MLST).
  - 12. The article of claim 8, wherein the step of transforming the current feature vector includes the step of determining ${\dot{x}}_{i}^{(f)} = \frac{1}{N_{i}^{α}}$
    - ⁢
      
      x i ( f ) - N i β
      
      N i α
      
      , where x_i^(f)is an ith component of a speech vector corresponding to the speech data provided to the speech recognition system, N_i^α is convolutional noise and N_i^β is additive noise of the ith component of the speech vector.
  - 13. The article of claim 8, wherein the step of estimating the one or more transformation parameters further includes the step of defining a diagonal matrix A with A_ii=1/N_i^α
    - , and defining b_i=−
      
      N_i^β/N_i^α.
  - 14. The article of claim 13, wherein the step of estimating the one or more transformation parameters further comprises the steps of:
    - determining A_iiin accordance with an expression $A_{ii} = \frac{T Σ_{t} x_{t, i}^{(ɛ)} m_{t, i}^{(ɛ)} - Σ_{t} x_{t, i}^{(ɛ)} Σ_{t} m_{t, i}^{(ɛ)}}{T Σ_{t} x_{t, i}^{(ɛ) 2} - Σ_{t} x_{t, i}^{(ɛ)} Σ_{t} x_{t, i}^{(ɛ)}};$
      
      and determining b_iin accordance with an expression $b_{i} = \frac{- A_{ii} Σ_{t} x_{t, i}^{(ɛ)} Σ_{t} m_{t, i}^{(ɛ)}}{T};$ where x_t,i^(ε
      
      )and m^t,i^(ε
      
      )are sub-linear spectral values of a feature vector and corresponding mean vector, respectively, for each ith component of the speech vector.

15. Apparatus for transforming speech feature vectors associated with speech data provided to a speech recognition system, the apparatus comprising:
- at least one processing device operative;
  
  (i) to receive likelihood of utterance information corresponding to a previous feature vector transformation;
  
  (ii) to estimate one or more transformation parameters based, at least in part, on the likelihood of utterance information corresponding to a previous feature vector transformation;
  
  (iii) to transform a current feature vector based on at least one of maximum likelihood criteria and the estimated transformation parameters, the transformation being performed in a linear spectral domain; and
  
  (iv) to estimate convolutional noise N_i^α and additive noise N_i^β for each ith component of a speech vector corresponding to the speech data provided to the speech recognition system.

Specification

Resources

Litigation Campaign Assessment

Current Assignee
Nuance Communications, Inc. (Microsoft Corporation)
Original Assignee
International Business Machines Corporation
Inventors
Lubensky, David M., Yuk, Dongsuk
Primary Examiner(s)
Opsasnick; Michael N.

Application Number

US11/215,415
Publication Number

US 20060009972A1
Time in Patent Office

742 Days
Field of Search

704/245, 704/222, 704/244, 704/234
US Class Current

704/234
CPC Class Codes

G10L 15/065   Adaptation

G10L 15/20   Speech recognition techniqu...

G10L 21/0216   characterised by the method...

Unsupervised incremental adaptation using maximum likelihood spectral transformation

First Claim

1 Assignment

0 Petitions

Accused Products

Abstract

25 Citations

15 Claims

Specification

Use Cases

Quick Links

Others

Unsupervised incremental adaptation using maximum likelihood spectral transformation

First Claim

1 Assignment

Subscription Required

Subscription Required

0 Petitions

Subscription Required

Accused Products

Subscription Required

Abstract

25 Citations

15 Claims

Specification

Subscription Required

Use Cases

Quick Links

Others