Speech and speaker recognition using factor analysis to model covariance structure of mixture components

US 5,946,656 A
Filed: 11/17/1997
Issued: 08/31/1999
Est. Priority Date: 11/17/1997
Status: Expired due to Term

First Claim

Patent Images

1. A method for recognizing speech, comprising the steps of:

deploying a speech recognizer and a memory coupled to the speech recognizer storing a set of HMMs, each HMM comprising one or more units, each unit represented with one or more states using one or more mixture components per state;

storing each mixture component in said memory as a set of means, a set of variances and a matrix of factors having a number of factors; and

recognizing speech using the speech recognizer.

View all claims

4 Assignments

Timeline View

Assignment View

0 Petitions

Accused Products

Abstract

Hidden Markov models (HMMs) rely on high-dimensional feature vectors to summarize the short-time properties of speech correlations between features that can arise when the speech signal is non-stationary or corrupted by noise. These correlations are modeled using factor analysis, a statistical method for dimensionality reduction. Factor analysis is used to model acoustic correlation in automatic speech recognition by introducing a small number of parameters to model the covariance structure of a speech signal. The parameters are estimated by an Expectation Maximization (EM) technique that can be embedded in the training procedures for the HMMs, and then further adjusted using Minimum Classification Error (MCE) training, which demonstrates better discrimination and produces more accurate recognition models.

Citations

28 Claims

1. A method for recognizing speech, comprising the steps of:
- deploying a speech recognizer and a memory coupled to the speech recognizer storing a set of HMMs, each HMM comprising one or more units, each unit represented with one or more states using one or more mixture components per state;
  
  storing each mixture component in said memory as a set of means, a set of variances and a matrix of factors having a number of factors; and
  
  recognizing speech using the speech recognizer.
- View Dependent Claims (2, 3, 4)
- - 2. A method as defined in claim 1, wherein:
    - the set of variances and the matrix of factors for each mixture component emulate a full covariance matrix that has been reduced in dimensionality by factor analysis.
  - 3. A method as defined in claim 1, further comprising the step of:
    - using factor analysis to make the set of variances and the matrix of factors.
  - 4. A method as defined in claim 1, further comprising the step of:
    - using the set of variances and the matrix of factors to emulate a full covariance matrix.

5. A speech recognition system, comprising:
- a feature extractor which produces a sequence of feature vectors representing speech;
  
  a speech recognizer coupled to the feature extractor, which receives the sequence of feature vectors and produces a transcription of the speech;
  
  a memory coupled to the speech recognizer which stores a set of HMMs, each HMM comprising one or more units, each unit represented with one or more states using one or more mixture components per state; and
  
  a set of variances and a matrix of factors stored in the memory for each mixture component, emulating a full covariance matrix corresponding to the mixture component.
- View Dependent Claims (6, 7, 8)
- - 6. A system as defined in claim 5, wherein:
    - each mixture component is a Gaussian probability density function.
  - 7. A system as defined in claim 5, wherein:
    - the full covariance matrix corresponding to each mixture component is stored in the memory in a condensed form using factor analysis.
  - 8. A system as defined in claim 7, wherein:
    - the condensed form includes the set of variances and the matrix of factors.

9. A method of making an HMM stored in a memory for use in a speech recognition system, wherein the HMM comprising one or more units, each unit represented with one or more states using one or more mixture components per state, the method comprising the steps of:
- determining a set of parameters for each mixture component from training data; and
  
  storing the set of parameters for each mixture component in the memory in a condensed form using factor analysis.
- View Dependent Claims (10, 11, 12, 13)
- - 10. A method as defined in claim 9, wherein:
    - each mixture component is a Gaussian probability density function.
  - 11. A method as defined in claim 9, wherein:
    - the condensed form comprises a set of variances and a matrix of factors which correspond to a full covariance matrix reduced in dimensionality by factor analysis.
  - 12. A method as defined in claim 9, further comprising the step of:
    - converting the set of parameters for each mixture component into a set of variances and a matrix of factors.
  - 13. A method as defined in claim 9, further comprising the step of:
    - adjusting one or more parameters based on additional training data.

14. A method of making a set of HMMs stored in a memory for use in a speech recognition system, the set of HMMs comprising one or more units, each unit represented with one or more states using one or more mixture components per state, and each mixture component comprising a diagonal covariance matrix, the method comprising the steps of:
- determining parameters of the diagonal covariance matrix for each mixture component from training data;
  
  estimating a number of factors using factor analysis to form a matrix of factors for each mixture component which provides a covariance structure for each diagonal covariance matrix; and
  
  storing the diagonal covariance matrix and the matrix of factors for each mixture component in the memory.
- View Dependent Claims (15, 16, 17, 18, 19, 20)
- - 15. A method as defined in claim 14, further comprising the step of:
    - varying the number of mixture components per state.
  - 16. A method as defined in claim 14, further comprising the step of:
    - varying the number of factors forming the matrix of factors for each mixture component.
  - 17. A method as defined in claim 14, further comprising the step of:
    - adjusting the number of factors per mixture component based on the amount of training data for each state.
  - 18. A method as defined in claim 14, further comprising the step of:
    - training the set of HMMs using a maximum likelihood training process.
  - 19. A method as defined in claim 14, further comprising the step of:
    - training the set of HMMs using a discriminative training process.
  - 20. A method as defined in claim 19, wherein:
    - the discriminative training process includes minimum classification error training.

21. A speech recognizer, comprising:
- a set of recognition models, each recognition model having one or more states, each state represented by one or more mixture components, wherein each mixture component is stored in a memory as a set of variances and a matrix of factors to emulate a full covariance matrix.
- View Dependent Claims (22, 23, 24, 25, 26, 27)
- - 22. A speech recognizer as defined in claim 21, wherein:
    - each recognition model is an HMM.
  - 23. A speech recognizer as defined in claim 21, wherein:
    - each mixture component is a Gaussian probability density function.
  - 24. A speech recognizer as defined in claim 21, wherein:
    - the recognition models are produced by a maximum likelihood training process.
  - 25. A speech recognizer as defined in claim 21, wherein:
    - the recognition models are produced by a discriminative training process.
  - 26. A speech recognizer as defined in claim 25, wherein:
    - the discriminative training process includes minimum classification error training.
  - 27. A speech recognizer as defined in claim 21, wherein:
    - each matrix of factors having a uniform number of factors per mixture component.

28. A method for use in a speech recognition system, the speech recognition system comprising a speech recognizer and a memory coupled to the speech recognizer storing a set of HMMs, each HMM comprising one or more units, each unit represented with one or more states using one or more mixture components per state, the method comprising the step of:
- applying factor analysis at a level of the set of HMMs selected from the group consisting of the unit level, the state level or the mixture component level.

Specification

Resources

Litigation Campaign Assessment

Current Assignee
Nuance Communications, Inc. (Microsoft Corporation)
Original Assignee
AT&T Corporation (AT&T, Inc.)
Inventors
Rahim, Mazin G., Saul, Lawrence K.
Primary Examiner(s)
Knepper, David D.
Assistant Examiner(s)
Smits, Talivaldis Ivars

Application Number

US08/971,838
Time in Patent Office

652 Days
Field of Search

704/236, 704/240, 704/256
US Class Current

704/256.2
CPC Class Codes

G10L 15/144   Training of HMMs

G10L 17/16   Hidden Markov models [HMM]

G10L 2015/085   Methods for reducing search...

Speech and speaker recognition using factor analysis to model covariance structure of mixture components

First Claim

4 Assignments

0 Petitions

Accused Products

Abstract

Citations

28 Claims

Specification

Solutions

Use Cases

Quick Links

Speech and speaker recognition using factor analysis to model covariance structure of mixture components

First Claim

4 Assignments

Subscription Required

Subscription Required

0 Petitions

Subscription Required

Accused Products

Subscription Required

Abstract

Citations

28 Claims

Specification

Subscription Required

Solutions

Use Cases

Quick Links