Feature space transformation for personalization using generalized i-vector clustering
First Claim
1. A method for speech personalization, comprising:
- receiving an utterance from a device;
estimating an i-vector using the utterance;
estimating hyperparameters for the utterance;
training a Gaussian Mixture Model (GMM) using the i-vectors extracted from a collection of utterances recorded from the device;
applying unsupervised constrained maximum likelihood linear regression (CMLLR) to the utterance; and
assigning the utterance to a cluster in the GMM.
3 Assignments
0 Petitions
Accused Products
Abstract
Personalization for Automatic Speech Recognition (ASR) is associated with a particular device. A generalized i-vector clustering method is used to train i-vector parameters on utterances received from a device and to classify test utterances from the same device. A sub-loading matrix and a residual noise term may be used when determining the personalization. A Universal Background Model (UBM) is trained using the utterances. The UBM is applied to obtain i-vectors of training utterances received from a device and a Gaussian Mixture Model (GMM) is trained using the i-vectors. During testing, the i-vector for each utterance received from the device is estimated using the device'"'"'s UBM. The utterance is then assigned to the cluster with the closest centroid in the GMM. For each utterance, the i-vector and the residual noise estimation is performed. Hyperparameter estimation is also performed. The i-vector estimation and hyperparameter estimation are performed until convergence.
23 Citations
19 Claims
-
1. A method for speech personalization, comprising:
-
receiving an utterance from a device; estimating an i-vector using the utterance; estimating hyperparameters for the utterance; training a Gaussian Mixture Model (GMM) using the i-vectors extracted from a collection of utterances recorded from the device; applying unsupervised constrained maximum likelihood linear regression (CMLLR) to the utterance; and assigning the utterance to a cluster in the GMM. - View Dependent Claims (2, 3, 4, 5, 6, 7, 8)
-
-
9. A computing device storing computer-executable instructions for speech personalization, comprising:
-
receiving an utterance from a device; estimating an i-vector using the utterance; estimating a residual noise for the utterance; estimating hyperparameters for the utterance; training a Gaussian Mixture Model (GMM) using the i-vectors extracted from a collection of utterances recorded from the device; applying unsupervised constrained maximum likelihood linear regression (CMLLR) to the utterance; and assigning the utterance to a cluster in the GMM. - View Dependent Claims (10, 11, 12, 13, 14, 15)
-
-
16. A system for speech personalization, comprising:
-
a processor and memory; an operating environment executing using the processor; and a personalization manager that is configured to perform actions comprising; receiving an utterance from a device; estimating an i-vector using the utterance; estimating a residual noise for the utterance; estimating hyperparameters for the utterance; training a Gaussian Mixture Model (GMM) using the estimated i-vectors from a collection of utterances received from the device; training a Universal Background Model (UBM) using the utterance received from the device; applying unsupervised constrained maximum likelihood linear regression (CMLLR) to the utterance; and assigning the utterance to a cluster in the GMM. - View Dependent Claims (17, 18, 19)
-
Specification