Low complexity, high accuracy clustering method for speech recognizer
First Claim
1. A clustering method for processing speech training data to generate a set of low complexity statistical models for use in automated speech recognition, comprising:
- segmenting the training data into labeled subword units;
generating Hidden Markov Models to represent said subword units,selecting a desired number of models to be between a predetermined minimum and a predetermined maximum by adjusting a threshold on the number of examples per model;
training said models with said segmented training data to generate;
(a) a first plurality of populated models based on instances of training data above a said threshold, and(b) a second plurality of populated models based on instances of training data below a said threshold;
merging each model of said second plurality with the closest neighbor of the models of said first plurality to form a set of new models and retraining the new models.
1 Assignment
0 Petitions
Accused Products
Abstract
The clustering technique produces a low complexity and yet high accuracy speech representation for use with speech recognizers. The task database comprising the test speech to be modeled is segmented into subword units such as phonemes and labeled to indicate each phoneme in its left and right context (triphones). Hidden Markov Models are constructed for each context-independent phoneme and trained. Then the center states are tied for all phonemes of the same class. Triphones are trained and all poorly-trained models are eliminated by merging their training data with the nearest well-trained model using a weighted divergence computation to ascertain distance. Before merging, the threshold for each class is adjusted until the number of good models for each phoneme class is within predetermined upper and lower limits. Finally, if desired, the number of mixture components used to represent each model may be increased and the models retrained. This latter step increases the accuracy.
72 Citations
10 Claims
-
1. A clustering method for processing speech training data to generate a set of low complexity statistical models for use in automated speech recognition, comprising:
-
segmenting the training data into labeled subword units; generating Hidden Markov Models to represent said subword units, selecting a desired number of models to be between a predetermined minimum and a predetermined maximum by adjusting a threshold on the number of examples per model; training said models with said segmented training data to generate; (a) a first plurality of populated models based on instances of training data above a said threshold, and (b) a second plurality of populated models based on instances of training data below a said threshold; merging each model of said second plurality with the closest neighbor of the models of said first plurality to form a set of new models and retraining the new models.
-
-
2. The method of claim 1 wherein said Hidden Markov Models employ Gaussian functions to represent states within the models and wherein said method further comprises increasing the number of Gaussian functions per state after said merging step is performed.
-
3. The method of claim 1 wherein each model of said second plurality is merged with the closest one of the models of said first plurality using a weighted distance to select said closest neighbor of the models.
-
4. The method of claim 3 wherein each model of said first plurality has a corresponding first number of training instances and each model of said second plurality has a corresponding second number of training instances and wherein said weighted distance is inversely proportional to the sum of the respective numbers of training instances.
-
5. The method of claim 1 wherein said merging step is performed class by class such that for each class the number of new models is between said predefined upper and lower limits.
-
6. A clustering method for processing speech training data to generate a set of low complexity statistical models for use in automated speech recognition, comprising:
-
segmenting the training data into labeled subword units, said subword units each being a member of one of a plurality of classes; generating Hidden Markov Models to represent said subword units, said models having a plurality of states including an intermediate state; tying said intermediate states of all models that represent subword units of the same class to define a plurality of state-tied models; selecting a desired number of models to be between a predetermined minimum and a predetermined maximum by adjusting a threshold on the number of examples per model; training said state-tied models with said segmented training data to generate; (a) a first plurality of populated models based on instances of training data above said predetermined threshold, and (b) a second plurality of populated models based on instances of training data below said predetermined threshold; merging each model of said second plurality with the closest neighbor of the models of said first plurality to form a set of new models and retraining the new models.
-
-
7. The method of claim 6 wherein said Hidden Markov Models employ Gaussian functions to represent states within the models and wherein said method further comprises increasing the number of Gaussian functions per state after said merging step is performed.
-
8. The method of claim 6 wherein each model of said second plurality is merged with the closest one of the models of said first plurality using a weighted distance to select said closest neighbor of the models.
-
9. The method of claim 8 wherein each model of said first plurality has a corresponding first number of training instances and each model of said second plurality has a corresponding second number of training instances and wherein said weighted distance is inversely proportional to the sum of the respective numbers of training instances.
-
10. The method of claim 6 wherein said merging step is performed class by class such that for each class the number of new models is between said predefined upper and lower limits.
Specification