I-Vector Based Clustering Training Data in Speech Recognition
First Claim
Patent Images
1. A computer-implemented method for clustering training data in speech recognition, the method comprising:
- extracting a plurality of i-vectors from speech data including a plurality of speech segments;
clustering the plurality of i-vectors into a plurality of clusters;
training an acoustic model using one of the plurality of clusters; and
recognizing one or more other speech segments using the trained acoustic model.
3 Assignments
0 Petitions
Accused Products
Abstract
Methods and systems for i-vector based clustering training data in speech recognition are described. An i-vector may be extracted from a speech segment of a speech training data to represent acoustic information. The extracted i-vectors from the speech training data may be clustered into multiple clusters using a hierarchical divisive clustering algorithm. Using a cluster of the multiple clusters, an acoustic model may be trained. This trained acoustic model may be used in speech recognition.
-
Citations
20 Claims
-
1. A computer-implemented method for clustering training data in speech recognition, the method comprising:
-
extracting a plurality of i-vectors from speech data including a plurality of speech segments; clustering the plurality of i-vectors into a plurality of clusters; training an acoustic model using one of the plurality of clusters; and recognizing one or more other speech segments using the trained acoustic model. - View Dependent Claims (2, 3, 4, 5, 6, 7, 8)
-
-
9. A method comprising:
under control of one or more computing systems comprising one or more processors, receiving speech data including a plurality of speech segments; extracting an i-vector from a speech segment of the plurality of speech segments; selecting a cluster corresponding to the i-vector; and determining an acoustic model corresponding to the cluster; and recognizing the speech segment using the acoustic model. - View Dependent Claims (10, 11, 12, 13, 14)
-
15. One or more computer-readable media storing instructions that are executable by one or more processors to perform acts comprising:
-
receiving a plurality of training speech segments; extracting multiple i-vectors from the plurality of training speech segments based on a set of hyperparameters of the plurality of training speech segments, individual ones of the i-vectors of the multiple i-vectors corresponding to a training speech segment of the plurality of training speech segments; clustering the i-vectors into multiple clusters; training a cluster-dependent acoustic model using a cluster of the multiple clusters; and recognizing an unknown speech segment using the cluster-dependent acoustic model. - View Dependent Claims (16, 17, 18, 19, 20)
-
Specification