Speech and speaker recognition using factor analysis to model covariance structure of mixture components
First Claim
1. A method for recognizing speech, comprising the steps of:
- deploying a speech recognizer and a memory coupled to the speech recognizer storing a set of HMMs, each HMM comprising one or more units, each unit represented with one or more states using one or more mixture components per state;
storing each mixture component in said memory as a set of means, a set of variances and a matrix of factors having a number of factors; and
recognizing speech using the speech recognizer.
4 Assignments
0 Petitions
Accused Products
Abstract
Hidden Markov models (HMMs) rely on high-dimensional feature vectors to summarize the short-time properties of speech correlations between features that can arise when the speech signal is non-stationary or corrupted by noise. These correlations are modeled using factor analysis, a statistical method for dimensionality reduction. Factor analysis is used to model acoustic correlation in automatic speech recognition by introducing a small number of parameters to model the covariance structure of a speech signal. The parameters are estimated by an Expectation Maximization (EM) technique that can be embedded in the training procedures for the HMMs, and then further adjusted using Minimum Classification Error (MCE) training, which demonstrates better discrimination and produces more accurate recognition models.
-
Citations
28 Claims
-
1. A method for recognizing speech, comprising the steps of:
-
deploying a speech recognizer and a memory coupled to the speech recognizer storing a set of HMMs, each HMM comprising one or more units, each unit represented with one or more states using one or more mixture components per state; storing each mixture component in said memory as a set of means, a set of variances and a matrix of factors having a number of factors; and recognizing speech using the speech recognizer. - View Dependent Claims (2, 3, 4)
-
-
5. A speech recognition system, comprising:
-
a feature extractor which produces a sequence of feature vectors representing speech; a speech recognizer coupled to the feature extractor, which receives the sequence of feature vectors and produces a transcription of the speech; a memory coupled to the speech recognizer which stores a set of HMMs, each HMM comprising one or more units, each unit represented with one or more states using one or more mixture components per state; and a set of variances and a matrix of factors stored in the memory for each mixture component, emulating a full covariance matrix corresponding to the mixture component. - View Dependent Claims (6, 7, 8)
-
-
9. A method of making an HMM stored in a memory for use in a speech recognition system, wherein the HMM comprising one or more units, each unit represented with one or more states using one or more mixture components per state, the method comprising the steps of:
-
determining a set of parameters for each mixture component from training data; and storing the set of parameters for each mixture component in the memory in a condensed form using factor analysis. - View Dependent Claims (10, 11, 12, 13)
-
-
14. A method of making a set of HMMs stored in a memory for use in a speech recognition system, the set of HMMs comprising one or more units, each unit represented with one or more states using one or more mixture components per state, and each mixture component comprising a diagonal covariance matrix, the method comprising the steps of:
-
determining parameters of the diagonal covariance matrix for each mixture component from training data; estimating a number of factors using factor analysis to form a matrix of factors for each mixture component which provides a covariance structure for each diagonal covariance matrix; and storing the diagonal covariance matrix and the matrix of factors for each mixture component in the memory. - View Dependent Claims (15, 16, 17, 18, 19, 20)
-
-
21. A speech recognizer, comprising:
a set of recognition models, each recognition model having one or more states, each state represented by one or more mixture components, wherein each mixture component is stored in a memory as a set of variances and a matrix of factors to emulate a full covariance matrix. - View Dependent Claims (22, 23, 24, 25, 26, 27)
-
28. A method for use in a speech recognition system, the speech recognition system comprising a speech recognizer and a memory coupled to the speech recognizer storing a set of HMMs, each HMM comprising one or more units, each unit represented with one or more states using one or more mixture components per state, the method comprising the step of:
applying factor analysis at a level of the set of HMMs selected from the group consisting of the unit level, the state level or the mixture component level.
Specification