Low complexity, high accuracy clustering method for speech recognizer

US 5,806,030 A
Filed: 05/06/1996
Issued: 09/08/1998
Est. Priority Date: 05/06/1996
Status: Expired due to Fees

First Claim

Patent Images

1. A clustering method for processing speech training data to generate a set of low complexity statistical models for use in automated speech recognition, comprising:

segmenting the training data into labeled subword units;

generating Hidden Markov Models to represent said subword units,selecting a desired number of models to be between a predetermined minimum and a predetermined maximum by adjusting a threshold on the number of examples per model;

training said models with said segmented training data to generate;

(a) a first plurality of populated models based on instances of training data above a said threshold, and(b) a second plurality of populated models based on instances of training data below a said threshold;

merging each model of said second plurality with the closest neighbor of the models of said first plurality to form a set of new models and retraining the new models.

View all claims

1 Assignment

Timeline View

Assignment View

0 Petitions

Accused Products

Abstract

The clustering technique produces a low complexity and yet high accuracy speech representation for use with speech recognizers. The task database comprising the test speech to be modeled is segmented into subword units such as phonemes and labeled to indicate each phoneme in its left and right context (triphones). Hidden Markov Models are constructed for each context-independent phoneme and trained. Then the center states are tied for all phonemes of the same class. Triphones are trained and all poorly-trained models are eliminated by merging their training data with the nearest well-trained model using a weighted divergence computation to ascertain distance. Before merging, the threshold for each class is adjusted until the number of good models for each phoneme class is within predetermined upper and lower limits. Finally, if desired, the number of mixture components used to represent each model may be increased and the models retrained. This latter step increases the accuracy.

72 Citations

View as Search Results

10 Claims

1. A clustering method for processing speech training data to generate a set of low complexity statistical models for use in automated speech recognition, comprising:
- segmenting the training data into labeled subword units;
  
  generating Hidden Markov Models to represent said subword units,selecting a desired number of models to be between a predetermined minimum and a predetermined maximum by adjusting a threshold on the number of examples per model;
  
  training said models with said segmented training data to generate;
  
  (a) a first plurality of populated models based on instances of training data above a said threshold, and(b) a second plurality of populated models based on instances of training data below a said threshold;
  
  merging each model of said second plurality with the closest neighbor of the models of said first plurality to form a set of new models and retraining the new models.

2. The method of claim 1 wherein said Hidden Markov Models employ Gaussian functions to represent states within the models and wherein said method further comprises increasing the number of Gaussian functions per state after said merging step is performed.

3. The method of claim 1 wherein each model of said second plurality is merged with the closest one of the models of said first plurality using a weighted distance to select said closest neighbor of the models.

4. The method of claim 3 wherein each model of said first plurality has a corresponding first number of training instances and each model of said second plurality has a corresponding second number of training instances and wherein said weighted distance is inversely proportional to the sum of the respective numbers of training instances.

5. The method of claim 1 wherein said merging step is performed class by class such that for each class the number of new models is between said predefined upper and lower limits.

6. A clustering method for processing speech training data to generate a set of low complexity statistical models for use in automated speech recognition, comprising:
- segmenting the training data into labeled subword units, said subword units each being a member of one of a plurality of classes;
  
  generating Hidden Markov Models to represent said subword units, said models having a plurality of states including an intermediate state;
  
  tying said intermediate states of all models that represent subword units of the same class to define a plurality of state-tied models;
  
  selecting a desired number of models to be between a predetermined minimum and a predetermined maximum by adjusting a threshold on the number of examples per model;
  
  training said state-tied models with said segmented training data to generate;
  
  (a) a first plurality of populated models based on instances of training data above said predetermined threshold, and(b) a second plurality of populated models based on instances of training data below said predetermined threshold;
  
  merging each model of said second plurality with the closest neighbor of the models of said first plurality to form a set of new models and retraining the new models.

7. The method of claim 6 wherein said Hidden Markov Models employ Gaussian functions to represent states within the models and wherein said method further comprises increasing the number of Gaussian functions per state after said merging step is performed.

8. The method of claim 6 wherein each model of said second plurality is merged with the closest one of the models of said first plurality using a weighted distance to select said closest neighbor of the models.

9. The method of claim 8 wherein each model of said first plurality has a corresponding first number of training instances and each model of said second plurality has a corresponding second number of training instances and wherein said weighted distance is inversely proportional to the sum of the respective numbers of training instances.

10. The method of claim 6 wherein said merging step is performed class by class such that for each class the number of new models is between said predefined upper and lower limits.

Specification

Resources

Litigation Campaign Assessment

Current Assignee
Matsushita Electric Industrial Company Limited (Panasonic Holdings Corporation)
Original Assignee
Matsushita Electric Industrial Company Limited (Panasonic Holdings Corporation)
Inventors
Junqua, Jean-Claude
Primary Examiner(s)
Hudspeth, David R.
Assistant Examiner(s)
Chawan, Vijay B.

Application Number

US08/642,767
Time in Patent Office

855 Days
Field of Search

395/2.54, 395/2.65, 395/2.49, 395/2.63, 395/2.64, 395/2.53, 395/2.6, 395/2.61, 395/2.45
US Class Current

704/245
CPC Class Codes

G10L 15/04   Segmentation; Word boundary...

G10L 15/142   Hidden Markov Models [HMMs]

G10L 2015/022   Demisyllables, biphones or ...

G10L 2015/0631   Creating reference template...

Low complexity, high accuracy clustering method for speech recognizer

First Claim

1 Assignment

0 Petitions

Accused Products

Abstract

72 Citations

10 Claims

Specification

Solutions

Use Cases

Quick Links

Low complexity, high accuracy clustering method for speech recognizer

First Claim

1 Assignment

Subscription Required

Subscription Required

0 Petitions

Subscription Required

Accused Products

Subscription Required

Abstract

72 Citations

10 Claims

Specification

Subscription Required

Solutions

Use Cases

Quick Links