Method and System for Selectively Biased Linear Discriminant Analysis in Automatic Speech Recognition Systems

US 20140058731A1
Filed: 08/23/2013
Published: 02/27/2014
Est. Priority Date: 08/24/2012
Status: Active Grant

First Claim

Patent Images

1. A method for training an acoustic model using the maximum likelihood criteria, comprising the steps of:

a. performing a forced alignment of speech training data;

b. processing the training data and obtaining estimated scatter matrices, wherein said scatter matrices may comprise one or more of a between class scatter matrix and a within-class scatter matrix, from which mean vectors may be estimated;

c. biasing the between class scatter matrix and the within-class scatter matrix;

d. diagonalizing the between class scatter matrix and the within class scatter matrix and estimating eigen-vectors to produce transformed scatter matrices;

e. obtaining new discriminative features using the estimated vectors, wherein said vectors correspond to the highest discrimination in the new space;

f. training a new acoustic model based on said new discriminative features; and

g. saving said acoustic model.

View all claims

6 Assignments

Timeline View

Assignment View

0 Petitions

Accused Products

Abstract

A system and method are presented for selectively biased linear discriminant analysis in automatic speech recognition systems. Linear Discriminant Analysis (LDA) may be used to improve the discrimination between the hidden Markov model (HMM) tied-states in the acoustic feature space. The between-class and within-class covariance matrices may be biased based on the observed recognition errors of the tied-states, such as shared HMM states of the context dependent tri-phone acoustic model. The recognition errors may be obtained from a trained maximum-likelihood acoustic model utilizing the tied-states which may then be used as classes in the analysis.

Citations

32 Claims

1. A method for training an acoustic model using the maximum likelihood criteria, comprising the steps of:
- a. performing a forced alignment of speech training data;
  
  b. processing the training data and obtaining estimated scatter matrices, wherein said scatter matrices may comprise one or more of a between class scatter matrix and a within-class scatter matrix, from which mean vectors may be estimated;
  
  c. biasing the between class scatter matrix and the within-class scatter matrix;
  
  d. diagonalizing the between class scatter matrix and the within class scatter matrix and estimating eigen-vectors to produce transformed scatter matrices;
  
  e. obtaining new discriminative features using the estimated vectors, wherein said vectors correspond to the highest discrimination in the new space;
  
  f. training a new acoustic model based on said new discriminative features; and
  
  g. saving said acoustic model.
- View Dependent Claims (2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15)
- - 2. The method of claim 1, wherein step (a) further comprises the step of using the current maximum likelihood acoustic model on the entire speech training data with a Hidden Markov Model—
    - Gaussian Mixture Model.
  - 3. The training data of claim 2, wherein said data may consist of phonemes and triphones wherein:
    - a. a triphone'"'"'s Hidden Markov Model states may be mapped to tied states;
      
      b. each feature frame may have a tied state class label; and
      
      c. said tied states may comprise unique classes between which the discrimination in an acoustic feature space is increased through selectively biased linear discriminant analysis.
  - 4. The method of claim 1, wherein step (b) further comprises the steps of:
    - a. performing tied triphone recognition on the training data using a trained model;
      
      b. recording a recognition error rate of each triphone tied state using a transcription of the training data;
      
      c. representing a segment of audio corresponding to a triphone with a 39 dimensional Mel-frequency cepstral coefficient feature vector and a first order derivative and a second order derivative;
      
      d. mapping training data internally to a tied-triphone state;
      
      e. forming a super vector with said Mel-frequency cepstral coefficient features;
      
      f. performing a forced Viterbi alignment to assign a tied state label to each frame in the training data; and
      
      g. estimating at least one of the between class and with-in class scatter matrices
  - 5. The method of claim 4, wherein the error rate of step (b) is comprises i∈
    - (1, 2, . . . , K) wherein the fraction of the frames which have a class label ‘
      
      k’
      
      as per the forced alignment but were misrecognized by the recognizer.
  - 6. The method of claim 4, wherein step (g) further comprises the steps of:
    - a. estimating a mean of the super vector using the tied state labels of the training data by averaging over each tied state class; and
      
      b. estimating a global mean vector.
  - 7. The method of claim 6, wherein step (a) is determined using the mathematical equation:
  - 8. The method of claim 6, wherein step (b) is determined using the mathematical equation:
  - 9. The method of claim 1, wherein step (c) is performed based on an error rate of tied state classes per an acoustic model.
  - 10. The method of claim 9, wherein the error rate for the between class scatter matrix is determined using the mathematical equation:
  - 11. The method of claim 9, wherein the error rate for the within class scatter matrix is determined using the mathematical equation:
  - 12. The method of claim 1, wherein step (d) further comprises the steps of:
    - a. performing a linear transformation;
      
      b. performing diagonalization;
      
      c. performing PCA; and
      
      d. saving the new matrices.
  - 13. The method of claim 12, wherein step (a) is performed using the mathematical equation:
  - 14. The method of claim 1, wherein step (f) is further comprises the steps of:
    - a. estimating parameters with new features obtained through a transformed matrix; and
      
      b. using a maximum likelihood formula with new features to perform training.
  - 15. The method of claim 14, wherein the training in step (b) is performed using a Hidden Markov Model—
    - Gaussian Mixture Model.

16. A method for training an acoustic model, comprising the steps of:
- a. performing a forced alignment of speech training data;
  
  b. performing recognition on said training data and estimating error rates of each tied-state triphone;
  
  c. processing the training data and obtaining one or more of an estimated scatter matrix from which a mean vector may be estimated;
  
  d. biasing the one or more of an estimated scatter matrix;
  
  e. performing diagonalization on one or more of an estimated scatter matrix and estimating a vector to produce one or more transformed scatter matrix;
  
  f. obtaining new discriminative features using the transformed one or more of an estimated scatter matrix as a linear transformation of a vector;
  
  g. training a new acoustic model; and
  
  h. saving said acoustic model.
- View Dependent Claims (17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28, 29, 30, 31)
- - 17. The method of claim 16, wherein step (a) further comprises the step of using the current maximum likelihood acoustic model on the entire training data with a Hidden Markov Model—
    - Gaussian Mixture Model.
  - 18. The training data of claim 17, wherein said speech training data may consist of phonemes and triphones wherein:
    - a. a triphone'"'"'s Hidden Markov Model states may be mapped to tied-states;
      
      b. each feature frame may have a tied state class label; and
      
      c. said tied states may comprise unique classes between which the discrimination in an acoustic feature space is increased through selectively biased linear discriminant analysis.
  - 19. The method of claim 16, wherein step (b) further comprises the steps of:
    - a. performing tied triphone recognition on the training data using a trained model;
      
      b. recording a recognition error rate of each triphone tied state using a transcription of the training data;
      
      c. representing a segment of audio corresponding to a triphone with a 39 dimensional Mel-frequency cepstral coefficient feature vector and a first order derivative and a second order derivative;
      
      d. mapping training data set internally to a tied-triphone state;
      
      e. forming a super vector with said Mel-frequency cepstral coefficient features;
      
      f. performing a forced Viterbi alignment to assign a tied state label to each frame in the training data set; and
      
      g. estimating the one or more scatter matrices.
  - 20. The method of claim 19, wherein the error rate of step (b) is defined as i∈
    - (1, 2, . . . , K).
  - 21. The method of claim 19, wherein step (g) further comprises the steps of:
    - a. estimating a mean of the super vector using the tied state labels of the training data by averaging over each tied state class; and
      
      b. estimating a global mean vector.
  - 22. The method of claim 21, wherein step (a) is determined using the mathematical equation:
  - 23. The method of claim 21, wherein step (b) is determined using the mathematical equation:
  - 24. The method of claim 16, wherein step (c) is performed based on an error rate of tied state classes per an acoustic model.
  - 25. The method of claim 16, wherein the one or more of an estimated scatter matrix comprises two scatter matrices, one is a between class scatter matrix and the other is a within class scatter matrix.
  - 26. The method of claim 25, wherein the error rate for the between class scatter matrix is determined using the mathematical equation:
  - 27. The method of claim 25, wherein the error rate for the within class scatter matrix is determined using the mathematical equation:
  - 28. The method of claim 16, wherein step (d) further comprises the steps of:
    - a. Performing a linear transformation;
      
      b. performing diagonalization, wherein the diagonalization occurs simultaneously with the linear transformation;
      
      c. performing PCA; and
      
      d. saving the new matrices.
  - 29. The method of claim 27, wherein step (a) is performed using the mathematical equation:
  - 30. The method of claim 16, wherein step (f) is further comprises the steps of:
    - a. estimating parameters with new features obtained through the one or more transformed matrix; and
      
      b. using a maximum likelihood formula with new features to perform training.
  - 31. The method of claim 30, wherein the training in step (b) is performed using a Hidden Markov Model—
    - Gaussian Mixture Model.

32. A system for training an acoustic model comprising:
- a. means for performing a forced alignment of speech training data;
  
  b. means for processing the training data and obtaining estimated scatter matrices, which may comprise one or more of a between class scatter matrix and a within-class scatter matrix, from which mean vectors may be estimated;
  
  c. means for biasing the between class scatter matrix and the within-class scatter matrix;
  
  d. means for diagonalizing the between class scatter matrix and the within class scatter matrix and estimating eigen-vectors to produce transformed scatter matrices;
  
  e. means for obtaining new discriminative features using the transformed scatter matrices as a linear transformation of a super vector;
  
  f. means for training a new acoustic model; and
  
  g. means for saving said acoustic model.

Specification

Resources

Litigation Campaign Assessment

Current Assignee
Genesys Cloud Services Incorporated
Original Assignee
Interactive Intelligence Incorporated (Genesys Cloud Services Incorporated)
Inventors
Tyagi, Vivek, Ganapathiraju, Aravind, Wyss, Felix Immanuel

Granted Patent

US 9,679,556 B2
Time in Patent Office

Days
Field of Search
US Class Current

704/243
CPC Class Codes

G10L 15/063 Training

Method and System for Selectively Biased Linear Discriminant Analysis in Automatic Speech Recognition Systems

First Claim

6 Assignments

0 Petitions

Accused Products

Abstract

Citations

32 Claims

Specification

Solutions

Use Cases

Quick Links

Method and System for Selectively Biased Linear Discriminant Analysis in Automatic Speech Recognition Systems

First Claim

6 Assignments

Subscription Required

Subscription Required

0 Petitions

Subscription Required

Accused Products

Subscription Required

Abstract

Citations

32 Claims

Specification

Subscription Required

Solutions

Use Cases

Quick Links