Method, apparatus, and system for building a compact model for large vocabulary continuous speech recognition (LVCSR) system
First Claim
1. A computer implemented method comprising:
- dividing a mean vector set having a plurality of dimensions into multiple mean sub-vector sets, the mean vector set including a mean vector of one of a set of N Gaussians, wherein the mean vector contributes only a sub-vector of the mean vector to one of the mean sub-vector sets, and wherein a set of all dimensions of the one of the mean sub-vector sets includes only a subset of the plurality of dimensions of the mean vector set;
dividing a variance vector set having a plurality of dimensions into multiple variance sub-vector sets, the variance vector set including a variance vector of one of the set of N Gaussians, wherein the variance vector contributes only a sub-vector of the variance vector to one of the variance sub-vector sets, and wherein a set of all dimensions of the one of the variance sub-vector sets includes only a subset of the plurality of dimensions of the variance vector set;
clustering each resultant sub-vector set to build a codebook for the respective sub-vector set according to a modified K-means clustering process which, during an iteration of the modified K-means clustering process, dynamicallyassigns each sub-vector in the respective sub-vector set to a respective cluster in a current cluster set,based upon a size of a particular cluster in the current cluster set,reassigns each sub-vector assigned to the particular cluster to another cluster in the current cluster set, andremoves the particular cluster from the current set of clusters to create a new cluster set, andsplits a cluster in the new cluster set based upon an average distortion of the cluster in the new cluster set;
decoding information related to a speech signal using said clustered sub-vector sets; and
providing a set of one or more words corresponding to the speech signal based on the decoded information.
2 Assignments
0 Petitions
Accused Products
Abstract
According to one aspect of the invention, a method is provided in which a mean vector set and a variance vector set of a set of N Gaussians are divided into multiple mean sub-vector sets and variance sub-vector sets, respectively. Each mean sub-vector set contains a subset of the dimensions of the corresponding mean vector set and each variance sub-vector set contains a subset of the dimensions of the corresponding variance vector set. Each resultant sub-vector set is clustered to build a codebook for the respective sub-vector set using a modified K-means clustering process which dynamically merges and splits clusters based upon the size and average distortion of each cluster during each iteration in the modified K-means clustering process.
33 Citations
23 Claims
-
1. A computer implemented method comprising:
-
dividing a mean vector set having a plurality of dimensions into multiple mean sub-vector sets, the mean vector set including a mean vector of one of a set of N Gaussians, wherein the mean vector contributes only a sub-vector of the mean vector to one of the mean sub-vector sets, and wherein a set of all dimensions of the one of the mean sub-vector sets includes only a subset of the plurality of dimensions of the mean vector set; dividing a variance vector set having a plurality of dimensions into multiple variance sub-vector sets, the variance vector set including a variance vector of one of the set of N Gaussians, wherein the variance vector contributes only a sub-vector of the variance vector to one of the variance sub-vector sets, and wherein a set of all dimensions of the one of the variance sub-vector sets includes only a subset of the plurality of dimensions of the variance vector set; clustering each resultant sub-vector set to build a codebook for the respective sub-vector set according to a modified K-means clustering process which, during an iteration of the modified K-means clustering process, dynamically assigns each sub-vector in the respective sub-vector set to a respective cluster in a current cluster set, based upon a size of a particular cluster in the current cluster set, reassigns each sub-vector assigned to the particular cluster to another cluster in the current cluster set, and removes the particular cluster from the current set of clusters to create a new cluster set, and splits a cluster in the new cluster set based upon an average distortion of the cluster in the new cluster set; decoding information related to a speech signal using said clustered sub-vector sets; and providing a set of one or more words corresponding to the speech signal based on the decoded information. - View Dependent Claims (2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14)
-
-
15. A system for speech recognition comprising:
- an acoustic model comprising;
a first plurality of codebooks, each codebook of the first plurality of codebooks being built based upon a mean sub-vector set of a mean vector set, the mean vector set having a plurality of dimensions, the mean vector set including a mean vector of one of a set of N Gaussians that represents state observation probability distributions of previously trained continuous density hidden Markov models (CDHMMs), wherein the mean vector contributes only a sub-vector of the mean vector to the mean sub-vector set, and wherein a set of all dimensions of the mean sub-vector set includes only a subset of the plurality of dimensions of the mean vector set, a second plurality of codebooks, each codebook of the second plurality of codebooks being built based upon a variance sub-vector set of a variance vector set, the variance vector set having a plurality of dimensions, the variance vector set including a variance vector of one of a set of N Gaussians that represents state observation probability distributions of previously trained continuous density hidden Markov models (CDHMMs), wherein the variance vector contributes only a sub-vector of the variance vector to the variance sub-vector set, and wherein a set of all dimensions of the variance sub-vector set includes only a subset of the plurality of dimensions of the variance vector set, each codebook being built according to a modified K-means clustering process which, during an iteration of the modified K-means clustering process, dynamically assigns each sub-vector in a respective sub-vector set to a cluster in a current cluster set, based upon a size of a particular cluster in the current cluster set, reassigns each sub-vector assigned to the particular cluster to another cluster in the current cluster set, and removes the particular cluster from the current cluster set to create a new cluster set, and splits a cluster in the new cluster set based upon an average distortion of the cluster in the new cluster set; a feature extraction unit to convert an input signal representing an input speech into a set of feature vectors each representing a corresponding frame of the input signal; and a decoder coupled to the acoustic model and the feature extraction unit, the decoder to provide a set of one or more words corresponding to the input speech based, at least in part, upon the feature vectors and the acoustic model. - View Dependent Claims (16, 17, 18, 19)
- an acoustic model comprising;
-
20. A machine-readable medium comprising instructions which, when executed by a machine, cause the machine to perform operations for speech recognition comprising:
-
dividing a mean vector set having a plurality of dimensions into multiple mean sub-vector sets, the mean vector set including a mean vector of one of a set of N Gaussians, wherein the mean vector contributes only a sub-vector of the mean vector to one of the mean sub-vector sets, and wherein a set of all dimensions of the one of the mean sub-vector sets includes only a subset of the plurality of dimensions of the mean vector set; dividing a variance vector set having a plurality of dimensions into multiple variance sub-vector sets, the variance vector set including a variance vector of one of the set of N Gaussians, wherein the variance vector contributes only a sub-vector of the variance vector to one of the variance sub-vector sets, and wherein a set of all dimensions of the one of the variance sub-vector sets includes only a subset of the plurality of dimensions of the variance vector set; clustering each resultant sub-vector set to build a codebook for the respective sub-vector set according to a modified K-means clustering process which, during an iteration of the modified K-means clustering process, dynamically assigns each sub-vector in the respective sub-vector set to a cluster in a current cluster set, based upon a size of a particular cluster in the current cluster set, reassigns each sub-vector assigned to the particular cluster to another cluster in the current cluster set, and removes the particular cluster from the current cluster set to create a new cluster set, and splits a cluster of the new cluster set based upon an average distortion of the cluster of the new cluster set; decoding information related to a speech signal using said clustered sub-vector sets; and providing a set of one or more words corresponding to the speech signal based on the decoded information. - View Dependent Claims (21, 22, 23)
-
Specification