Method and apparatus for speech recognition using optimized partial mixture tying of HMM state functions

US 5,825,978 A
Filed: 07/18/1994
Issued: 10/20/1998
Est. Priority Date: 07/18/1994
Status: Expired due to Term

First Claim

Patent Images

1. In a speech recognition system using a method for recognizing human speech, the method being of the type comprising the steps of:

selecting a model to represent a selected subunit of speech, the model having associated with it a plurality of states and each state having associated with it a probability function, the probability function having undetermined parameters, the probability functions being represented by a mixture of simple probability functions, the simple probability functions being stored in a master codebook;

extracting features from a set of speech training data;

using the features to determine parameters for the probability functions in the model,an,improvement comprising the steps of;

identifying states that are mostly represented by a related set of simple probability functions;

clustering said states that are mostly represented by a related set of simple probability functions into a plurality of clusters;

splitting up the master codebook into a plurality of cluster codebooks, one cluster codebook associated with each one of said clusters;

pruning the cluster codebooks to reduce the number of entries in each said codebook by retaining the simple probability functions that are most used by the states in the cluster and deleting remaining functions; and

re-estimating the simple probability functions in each cluster codebook to better fit the states in that cluster and re-estimating the parameters for each state in the cluster.

View all claims

2 Assignments

Timeline View

Assignment View

0 Petitions

Accused Products

Abstract

In accordance with the invention, a speech recognizer is provided which uses a computationally-feasible method for constructing a set of Hidden Markov Models (HMMs) for speech recognition that utilize a partial and optimal degree of mixture tying. With partially-tied HMMs, improved recognition accuracy of a large vocabulary word corpus as compared to systems that use fully-tied HMMs is achieved with less computational overhead than with a fully untied system. The computationally-feasible technique comprises the steps of determining a cluster of HMM states that share Gaussian components which are close together, developing a subset codebook for those clusters, and recalculating the Gaussians in the codebook to best estimate the clustered states.

62 Citations

View as Search Results

17 Claims

1. In a speech recognition system using a method for recognizing human speech, the method being of the type comprising the steps of:
- selecting a model to represent a selected subunit of speech, the model having associated with it a plurality of states and each state having associated with it a probability function, the probability function having undetermined parameters, the probability functions being represented by a mixture of simple probability functions, the simple probability functions being stored in a master codebook;
  
  extracting features from a set of speech training data;
  
  using the features to determine parameters for the probability functions in the model,an,improvement comprising the steps of;
  
  identifying states that are mostly represented by a related set of simple probability functions;
  
  clustering said states that are mostly represented by a related set of simple probability functions into a plurality of clusters;
  
  splitting up the master codebook into a plurality of cluster codebooks, one cluster codebook associated with each one of said clusters;
  
  pruning the cluster codebooks to reduce the number of entries in each said codebook by retaining the simple probability functions that are most used by the states in the cluster and deleting remaining functions; and
  
  re-estimating the simple probability functions in each cluster codebook to better fit the states in that cluster and re-estimating the parameters for each state in the cluster.
- View Dependent Claims (2, 3, 4, 5, 6, 7)
- - 2. The method according to claim 1 wherein the simple probability functions are Gaussians.
  - 3. The method according to claim 1 wherein the number of said plurality of clusters is an arbitrary number selected based on system resources and desired performance characteristics.
  - 4. The method according to claim 1 wherein all the states in a cluster are states of one phone and its allophones.
  - 5. The method according to claim 1 wherein the states of one phone use different cluster codebooks.
  - 6. The method according to claim 1 wherein the model is a three-state Hidden Markov Model.
  - 7. The method according to claim 1 wherein states are clustered according to an agglomerative hierarchical clustering scheme.

8. In a speech recognition system for responding to signals representative of digital speech, a method for developing models for subsets of speech comprising the steps of:
- selecting a multi-state model with state probability functions, said probability functions being of a general form with initially undetermined parameters;
  
  creating an individual instance of a model for each subunit of speech to be processed;
  
  clustering states based on their acoustic similarity;
  
  creating a plurality of cluster codebooks, one codebook for each cluster;
  
  said cluster codebooks consisting of probability density functions that are shared by each cluster'"'"'s states;
  
  estimating the probability densities of each cluster codebook and the parameters of the probability equations in each cluster.
- View Dependent Claims (9, 10, 11, 12, 13, 14)
- - 9. The method according to claim 8 wherein the simple probability functions are Gaussians.
  - 10. The method according to claim 8 wherein the number of said clusters is an arbitrary number selected based on system resources and desired performance characteristics.
  - 11. The method according to claim 8 wherein all the states in a cluster are states of one phone and its allophones.
  - 12. The method according to claim 8 wherein the states of one phone use different cluster codebooks.
  - 13. The method according to claim 8 wherein the model is a three-state Hidden Markov Model.
  - 14. The method according to claim 8 wherein states are clustered according to an agglomerative heirarchical clustering scheme.

15. A method for recognizing speech using a computer comprising the steps of:
- selecting a multi-state model with state probability functions, said probability functions being of a general form with initially undetermined parameters;
  
  creating an individual instance of a model for each subunit of speech to be processed;
  
  training the models with a training algorithm that determines parameters for the models that best fit a set of features extracted from known speech samples;
  
  clustering states into a predetermined number of clusters of states wherein the states in each said cluster have probability functions that can be well represented by a shared group of simple probability functions;
  
  developing a cluster codebook of simple probability functions for each cluster and storing said cluster codebooks;
  
  storing for each state an identifier for a cluster codebook and an array of weighting factors;
  
  extracting features from a speech sample to be recognized; and
  
  using said state probability functions and said cluster codebooks to determine a most probable state sequence for said speech sample.

16. In a speech recognition system for responding to signals representative of digital speech, a method for developing models for subsets of speech comprising the steps of:
- selecting a multi-state model with state probability functions, said probability functions being of a general form with initially undetermined parameters;
  
  creating an individual instance of a model for each subunit of speech to be processed;
  
  clustering states based on their acoustic similarities;
  
  creating a plurality of cluster codebooks, one cluster codebook for each cluster;
  
  each codebook comprising a group of shared probability functions; and
  
  re-estimating the probability densities of each cluster codebook and the parameters of the probability equations for each state in each cluster.

17. A speech recognizer, comprising:
- a computer;
  
  storage means;
  
  a set of models for subunits of speech stored in the storage means;
  
  a feature extractor in the computer for extracting feature data capable of being processed by said computer from a speech signal;
  
  training means in the computer for training the models using features from identified samples of speech data and for producing a master codebook of probability density functions for use by the models;
  
  clustering means in the computer for identifying clusters of states that share subsets of the probability density functions in the codebooks;
  
  splitting and pruning means in the computer for producing cluster codebooks by splitting the master codebook into subsets of probability densities shared by clustered states;
  
  re-estimating means for retraining the models for the states in the clusters and for recalculating the probability densities in each cluster codebook;
  
  recognizing means for matching features from unidentified speech data to the models to produce a most likely path through the models where the path defines the most likely subunits and words in the speech data.

Specification

Resources

Litigation Campaign Assessment

Current Assignee
SRI International, Inc.
Original Assignee
SRI International, Inc.
Inventors
Digalakis, Vassilios, Murveit, Hy
Primary Examiner(s)
MacDonald, Allen R.
Assistant Examiner(s)
MCFADDEN, SUSAN IRIS

Application Number

US08/276,742
Time in Patent Office

1,555 Days
Field of Search

395/2.53-2.54, 395/2.59, 395/2.64-2.65
US Class Current

704/256
CPC Class Codes

G10L 15/146 with insufficient amount of...

Method and apparatus for speech recognition using optimized partial mixture tying of HMM state functions

First Claim

2 Assignments

0 Petitions

Accused Products

Abstract

62 Citations

17 Claims

Specification

Solutions

Use Cases

Quick Links

Method and apparatus for speech recognition using optimized partial mixture tying of HMM state functions

First Claim

2 Assignments

Subscription Required

Subscription Required

0 Petitions

Subscription Required

Accused Products

Subscription Required

Abstract

62 Citations

17 Claims

Specification

Subscription Required

Solutions

Use Cases

Quick Links