Method, apparatus, and system for building a compact model for large vocabulary continuous speech recognition (LVCSR) system

US 7,454,341 B1
Filed: 09/30/2000
Issued: 11/18/2008
Est. Priority Date: 09/30/2000
Status: Expired due to Fees

First Claim

Patent Images

1. A computer implemented method comprising:

dividing a mean vector set having a plurality of dimensions into multiple mean sub-vector sets, the mean vector set including a mean vector of one of a set of N Gaussians, wherein the mean vector contributes only a sub-vector of the mean vector to one of the mean sub-vector sets, and wherein a set of all dimensions of the one of the mean sub-vector sets includes only a subset of the plurality of dimensions of the mean vector set;

dividing a variance vector set having a plurality of dimensions into multiple variance sub-vector sets, the variance vector set including a variance vector of one of the set of N Gaussians, wherein the variance vector contributes only a sub-vector of the variance vector to one of the variance sub-vector sets, and wherein a set of all dimensions of the one of the variance sub-vector sets includes only a subset of the plurality of dimensions of the variance vector set;

clustering each resultant sub-vector set to build a codebook for the respective sub-vector set according to a modified K-means clustering process which, during an iteration of the modified K-means clustering process, dynamicallyassigns each sub-vector in the respective sub-vector set to a respective cluster in a current cluster set,based upon a size of a particular cluster in the current cluster set,reassigns each sub-vector assigned to the particular cluster to another cluster in the current cluster set, andremoves the particular cluster from the current set of clusters to create a new cluster set, andsplits a cluster in the new cluster set based upon an average distortion of the cluster in the new cluster set;

decoding information related to a speech signal using said clustered sub-vector sets; and

providing a set of one or more words corresponding to the speech signal based on the decoded information.

View all claims

2 Assignments

Timeline View

Assignment View

0 Petitions

Accused Products

Abstract

According to one aspect of the invention, a method is provided in which a mean vector set and a variance vector set of a set of N Gaussians are divided into multiple mean sub-vector sets and variance sub-vector sets, respectively. Each mean sub-vector set contains a subset of the dimensions of the corresponding mean vector set and each variance sub-vector set contains a subset of the dimensions of the corresponding variance vector set. Each resultant sub-vector set is clustered to build a codebook for the respective sub-vector set using a modified K-means clustering process which dynamically merges and splits clusters based upon the size and average distortion of each cluster during each iteration in the modified K-means clustering process.

33 Citations

View as Search Results

23 Claims

1. A computer implemented method comprising:
- dividing a mean vector set having a plurality of dimensions into multiple mean sub-vector sets, the mean vector set including a mean vector of one of a set of N Gaussians, wherein the mean vector contributes only a sub-vector of the mean vector to one of the mean sub-vector sets, and wherein a set of all dimensions of the one of the mean sub-vector sets includes only a subset of the plurality of dimensions of the mean vector set;
  
  dividing a variance vector set having a plurality of dimensions into multiple variance sub-vector sets, the variance vector set including a variance vector of one of the set of N Gaussians, wherein the variance vector contributes only a sub-vector of the variance vector to one of the variance sub-vector sets, and wherein a set of all dimensions of the one of the variance sub-vector sets includes only a subset of the plurality of dimensions of the variance vector set;
  
  clustering each resultant sub-vector set to build a codebook for the respective sub-vector set according to a modified K-means clustering process which, during an iteration of the modified K-means clustering process, dynamicallyassigns each sub-vector in the respective sub-vector set to a respective cluster in a current cluster set,based upon a size of a particular cluster in the current cluster set,reassigns each sub-vector assigned to the particular cluster to another cluster in the current cluster set, andremoves the particular cluster from the current set of clusters to create a new cluster set, andsplits a cluster in the new cluster set based upon an average distortion of the cluster in the new cluster set;
  
  decoding information related to a speech signal using said clustered sub-vector sets; and
  
  providing a set of one or more words corresponding to the speech signal based on the decoded information.
- View Dependent Claims (2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14)
- - 2. The method of claim 1 wherein the N Gaussians represent corresponding observation probability functions of previously trained continuous density hidden Markov models (CDHMMs).
  - 3. The method of claim 2 further comprising:
    - training the CDHMMs using a training set of speech wherein each frame of training speech is represented by a corresponding feature vector having a plurality of feature components.
  - 4. The method of claim 3 wherein the plurality of feature components comprises a first stream of Mel-Frequency Cepstral Coefficients (MFCCs), a second stream of the first order derivatives of the MFCCs (delta MFCCs), and a third stream of the second order derivatives of the MFCCs (delta-delta MFCCs).
  - 5. The method of claim 4 wherein each sub-vector set corresponds to a subset of the plurality of feature components.
  - 6. The method of claim 5 wherein the mean vector set and the variance vector set each is divided into three sub-vector sets, respectively, each respective sub-vector set corresponds to a distinct stream of the feature components.
  - 7. The method of claim 1 wherein the modified K-means clustering process furthercreates an initial codebook for the respective sub-vector set using the entire sub-vector set as the initial cluster;
    - andperforms the iteration until a predetermined number of iterations is performed, the iteration further to dynamically;
      
      split each cluster in a previous cluster set into two new clusters, the splitting to form the current cluster set,create a corresponding codebook based upon the current cluster set, andcompute a centroid for each cluster in the new cluster set.
  - 8. The method of claim 7 wherein creating the initial codebook comprises:
    - computing the centroid of the entire sub-vector set.
  - 9. The method of claim 7 wherein the modified K-means clustering process further determines whether the predetermined number of iterations is reached.
  - 10. The method of claim 9 wherein splitting each cluster in a previous cluster set comprises:
    - computing, for all sub-vectors in the respective cluster, the average variance to the centroid of the respective cluster; and
      
      creating two new centroids based upon the centroid of the respective cluster and the average variance computed.
  - 11. The method of claim 10 further comprising:
    - combining all centroids that have been created thus far to build the corresponding codebook; and
      
      initializing the value of the total accumulated distance to a predetermined value.
  - 12. The method of claim 11 wherein assigning each sub-vector in the respective sub-vector set to a respective cluster in a current cluster set comprises comprises:
    - calculating the distance from the respective sub-vector to each of the existing centroids; and
      
      associating the respective sub-vector with the nearest centroid.
  - 13. The method of claim 12 wherein Bhattacharyya distance is used as a distance measure.
  - 14. The method of claim 12 wherein the iteration of the modified K-means clustering process furthercalculates the total distortion and number of associated sub-vectors for each cluster;
    - wherein reassigning each sub-vector assigned to the particular cluster and removing the particular cluster from the current set of clusters based upon a size of a particular cluster comprises reassigning and removing for each cluster whose number of associated sub-vectors is less than a predetermined size; and
      
      wherein splitting a cluster in the new cluster set comprises splitting a cluster in the new cluster set that has the maximum average distortion, where the new cluster set was created upon the merging of a cluster.

15. A system for speech recognition comprising:
- an acoustic model comprising;
  
  a first plurality of codebooks, each codebook of the first plurality of codebooks being built based upon a mean sub-vector set of a mean vector set, the mean vector set having a plurality of dimensions, the mean vector set including a mean vector of one of a set of N Gaussians that represents state observation probability distributions of previously trained continuous density hidden Markov models (CDHMMs), wherein the mean vector contributes only a sub-vector of the mean vector to the mean sub-vector set, and wherein a set of all dimensions of the mean sub-vector set includes only a subset of the plurality of dimensions of the mean vector set,a second plurality of codebooks, each codebook of the second plurality of codebooks being built based upon a variance sub-vector set of a variance vector set, the variance vector set having a plurality of dimensions, the variance vector set including a variance vector of one of a set of N Gaussians that represents state observation probability distributions of previously trained continuous density hidden Markov models (CDHMMs), wherein the variance vector contributes only a sub-vector of the variance vector to the variance sub-vector set, and wherein a set of all dimensions of the variance sub-vector set includes only a subset of the plurality of dimensions of the variance vector set,each codebook being built according to a modified K-means clustering process which, during an iteration of the modified K-means clustering process, dynamicallyassigns each sub-vector in a respective sub-vector set to a cluster in a current cluster set,based upon a size of a particular cluster in the current cluster set,reassigns each sub-vector assigned to the particular cluster to another cluster in the current cluster set, andremoves the particular cluster from the current cluster set to create a new cluster set, andsplits a cluster in the new cluster set based upon an average distortion of the cluster in the new cluster set;
  
  a feature extraction unit to convert an input signal representing an input speech into a set of feature vectors each representing a corresponding frame of the input signal; and
  
  a decoder coupled to the acoustic model and the feature extraction unit, the decoder to provide a set of one or more words corresponding to the input speech based, at least in part, upon the feature vectors and the acoustic model.
- View Dependent Claims (16, 17, 18, 19)
- - 16. The system of claim 15 wherein the mean vector set and the variance vector set each is divided into three sub-vector sets, the first sub-vector set corresponding to a stream Mel-Frequency Cepstral Coefficients (MFCCs), the second sub-vector set corresponding to the first order derivatives of the MFCCs (delta MFCCs), and the third sub-vector set corresponding to a stream of the second order derivatives of the MFCCs (delta-delta MFCCs).
  - 17. The system of claim 15 wherein the modified K-means clustering process furthercreates an initial codebook for the respective sub-vector set using the entire vector set as the initial cluster;
    - andperforms the iteration until a predetermined number of iterations is performed, the iteration further to dynamically;
      
      split each cluster in a previous cluster set into two new clusters, the splitting to form the current cluster set, create a corresponding codebook based upon the current cluster set, andcompute a centroid for each cluster in the new cluster set.
  - 18. The system of claim 17 wherein splitting each cluster in a previous cluster set into two new clusters comprises:
    - computing the average variance to the centroid of the respective cluster for all sub-vectors in the respective cluster;
      
      creating two new centroids based upon the centroid of the respective cluster and the average variance computed; and
      
      combining all centroids that have been created thus far to build the corresponding codebook for the current iteration.
  - 19. The system of claim 18 wherein assigning each sub-vector in the respective sub-vector set to a respective cluster in a current cluster set comprises:
    - calculating the distance from the respective sub-vector to each of the centroids and associating the respective sub-vector with the nearest centroid; and
      
      wherein the iteration of the modified K-means clustering process further calculates the total distortion and number of associated sub-vectors for each cluster;
      
      wherein reassigning each sub-vector assigned to the particular cluster and removing the particular cluster from the current set of clusters based upon a size of a particular cluster comprises reassigning and removing for each cluster whose number of associated sub-vectors is less than a predetermined size; and
      
      wherein splitting a cluster in the new cluster set comprises splitting a cluster in the new cluster set that has the maximum average distortion where the new cluster set was created upon the merging of a cluster.

20. A machine-readable medium comprising instructions which, when executed by a machine, cause the machine to perform operations for speech recognition comprising:
- dividing a mean vector set having a plurality of dimensions into multiple mean sub-vector sets, the mean vector set including a mean vector of one of a set of N Gaussians, wherein the mean vector contributes only a sub-vector of the mean vector to one of the mean sub-vector sets, and wherein a set of all dimensions of the one of the mean sub-vector sets includes only a subset of the plurality of dimensions of the mean vector set;
  
  dividing a variance vector set having a plurality of dimensions into multiple variance sub-vector sets, the variance vector set including a variance vector of one of the set of N Gaussians, wherein the variance vector contributes only a sub-vector of the variance vector to one of the variance sub-vector sets, and wherein a set of all dimensions of the one of the variance sub-vector sets includes only a subset of the plurality of dimensions of the variance vector set;
  
  clustering each resultant sub-vector set to build a codebook for the respective sub-vector set according to a modified K-means clustering process which, during an iteration of the modified K-means clustering process, dynamicallyassigns each sub-vector in the respective sub-vector set to a cluster in a current cluster set,based upon a size of a particular cluster in the current cluster set,reassigns each sub-vector assigned to the particular cluster to another cluster in the current cluster set, andremoves the particular cluster from the current cluster set to create a new cluster set, andsplits a cluster of the new cluster set based upon an average distortion of the cluster of the new cluster set;
  
  decoding information related to a speech signal using said clustered sub-vector sets; and
  
  providing a set of one or more words corresponding to the speech signal based on the decoded information.
- View Dependent Claims (21, 22, 23)
- - 21. The machine-readable medium of claim 20 wherein the mean vector set and the variance vector set each is divided into three sub-vector sets, respectively, each respective sub-vector set corresponds to a distinct stream of feature components.
  - 22. The machine-readable medium of claim 20 wherein the modified K-means clustering process furthercreates an initial codebook for the respective sub-vector set using the entire sub-vector set as the initial cluster;
    - andperforms the iteration until a predetermined number of iterations is performed, the iteration further to dynamically;
      
      split each cluster in a previous cluster set into two new clusters, the splitting to form the current cluster set,create a corresponding codebook based upon the current cluster set, andcompute a centroid for each cluster in the new cluster set.
  - 23. The machine-readable medium of claim 22 wherein the iteration of the modified K-means clustering process furthercalculates the total distortion and number of associated sub-vectors for each cluster;
    - wherein reassigning each vector assigned to the particular cluster and removing the particular cluster from the current set of clusters based upon a size of a particular cluster comprises reassigning and removing for each cluster whose number of associated vectors is less than a predetermined size wherein splitting a cluster in the new cluster set comprises splitting a cluster in the new cluster set that has the maximum average distortion, where the new cluster set was created upon the merging of a cluster.

Specification

Resources

Litigation Campaign Assessment

Current Assignee
Intel Corporation
Original Assignee
Intel Corporation
Inventors
Pan, Jielin, Yuan, Baosheng
Primary Examiner(s)
Dorvil; Richemond
Assistant Examiner(s)
Lennox; Natalie

Application Number

US10/148,028
Time in Patent Office

2,971 Days
Field of Search

704/231, 704/256.7, 704/256.1, 704/245, 704/238, 704/256.8, 704/256, 704/232
US Class Current

704/256
CPC Class Codes

G06F 18/23213   with fixed number of cluste...

G10L 15/063   Training

G10L 2015/0631   Creating reference template...

Method, apparatus, and system for building a compact model for large vocabulary continuous speech recognition (LVCSR) system

First Claim

2 Assignments

0 Petitions

Accused Products

Abstract

33 Citations

23 Claims

Specification

Solutions

Use Cases

Quick Links

Method, apparatus, and system for building a compact model for large vocabulary continuous speech recognition (LVCSR) system

First Claim

2 Assignments

Subscription Required

Subscription Required

0 Petitions

Subscription Required

Accused Products

Subscription Required

Abstract

33 Citations

23 Claims

Specification

Subscription Required

Solutions

Use Cases

Quick Links