Feature space transformation for personalization using generalized i-vector clustering

US 9,208,777 B2
Filed: 01/25/2013
Issued: 12/08/2015
Est. Priority Date: 01/25/2013
Status: Active Grant

First Claim

Patent Images

1. A method for speech personalization, comprising:

receiving an utterance from a device;

estimating an i-vector using the utterance;

estimating hyperparameters for the utterance;

training a Gaussian Mixture Model (GMM) using the i-vectors extracted from a collection of utterances recorded from the device;

applying unsupervised constrained maximum likelihood linear regression (CMLLR) to the utterance; and

assigning the utterance to a cluster in the GMM.

View all claims

3 Assignments

Timeline View

Assignment View

0 Petitions

Accused Products

Abstract

Personalization for Automatic Speech Recognition (ASR) is associated with a particular device. A generalized i-vector clustering method is used to train i-vector parameters on utterances received from a device and to classify test utterances from the same device. A sub-loading matrix and a residual noise term may be used when determining the personalization. A Universal Background Model (UBM) is trained using the utterances. The UBM is applied to obtain i-vectors of training utterances received from a device and a Gaussian Mixture Model (GMM) is trained using the i-vectors. During testing, the i-vector for each utterance received from the device is estimated using the device'"'"'s UBM. The utterance is then assigned to the cluster with the closest centroid in the GMM. For each utterance, the i-vector and the residual noise estimation is performed. Hyperparameter estimation is also performed. The i-vector estimation and hyperparameter estimation are performed until convergence.

23 Citations

View as Search Results

19 Claims

1. A method for speech personalization, comprising:
- receiving an utterance from a device;
  
  estimating an i-vector using the utterance;
  
  estimating hyperparameters for the utterance;
  
  training a Gaussian Mixture Model (GMM) using the i-vectors extracted from a collection of utterances recorded from the device;
  
  applying unsupervised constrained maximum likelihood linear regression (CMLLR) to the utterance; and
  
  assigning the utterance to a cluster in the GMM.
- View Dependent Claims (2, 3, 4, 5, 6, 7, 8)
- - 2. The method of claim 1, further comprising estimating a residual noise for the utterance.
  - 3. The method of claim 1, further comprising training a Universal Background Model (UBM) using the utterance received from the device.
  - 4. The method of claim 1, further comprising receiving additional utterances from the device and estimating the i-vector and estimating hyperparameters until convergence.
  - 5. The method of claim 1, wherein assigning the utterance to the cluster in the Gaussian Mixture Model (GMM) comprises assigning the utterance to a cluster with the closest centroid in the GMM.
  - 6. The method of claim 1, further comprising when performing Automatic Speech Recognition (ASR) using the estimated i-vectors, estimated hyperparameters and an estimated residual noise determined from the utterances received from the device in a UBM and a GMM.
  - 7. The method of claim 1, further comprising receiving the utterance from at least one of a gaming device;
    - a tablet; and
      
      a smartphone.
  - 8. The method of claim 1, further comprising training a Universal Background Model using utterances consisting of utterances received from the device.

9. A computing device storing computer-executable instructions for speech personalization, comprising:
- receiving an utterance from a device;
  
  estimating an i-vector using the utterance;
  
  estimating a residual noise for the utterance;
  
  estimating hyperparameters for the utterance;
  
  training a Gaussian Mixture Model (GMM) using the i-vectors extracted from a collection of utterances recorded from the device;
  
  applying unsupervised constrained maximum likelihood linear regression (CMLLR) to the utterance; and
  
  assigning the utterance to a cluster in the GMM.
- View Dependent Claims (10, 11, 12, 13, 14, 15)
- - 10. The computing device of claim 9, further comprising training a Universal Background Model (UBM) using the utterance received from the device.
  - 11. The computing device of claim 9, further comprising receiving additional utterances from the device and estimating the i-vector and estimating hyperparameters until convergence.
  - 12. computing device of claim 9, further comprising assigning the utterance to a cluster in the Gaussian Mixture Model (GMM).
  - 13. The computing device of claim 12, wherein assigning the utterance to the cluster in the Gaussian Mixture Model (GMM) comprises assigning the utterance to a cluster with the closest centroid in the GMM.
  - 14. The computing device of claim 9, further comprising when performing Automatic Speech Recognition (ASR) using the estimated i-vectors, estimated hyperparameters and an estimated residual noise determined from the utterances received from the device in a UBM and a GMM.
  - 15. The computing device of claim 9, further comprising training a Universal Background Model using utterances consisting of utterances received from the device.

16. A system for speech personalization, comprising:
- a processor and memory;
  
  an operating environment executing using the processor; and
  
  a personalization manager that is configured to perform actions comprising;
  
  receiving an utterance from a device;
  
  estimating an i-vector using the utterance;
  
  estimating a residual noise for the utterance;
  
  estimating hyperparameters for the utterance;
  
  training a Gaussian Mixture Model (GMM) using the estimated i-vectors from a collection of utterances received from the device;
  
  training a Universal Background Model (UBM) using the utterance received from the device;
  
  applying unsupervised constrained maximum likelihood linear regression (CMLLR) to the utterance; and
  
  assigning the utterance to a cluster in the GMM.
- View Dependent Claims (17, 18, 19)
- - 17. The system of claim 16, further comprising receiving additional utterances from the device and estimating the i-vector and estimating hyperparameters until convergence.
  - 18. The system of claim 16, further comprising assigning the utterance to a cluster with the closest centroid in the Gaussian Mixture Model (GMM).
  - 19. The system of claim 16, further comprising training a Universal Background Model using utterances consisting of utterances received from the device.

Specification

Resources

Litigation Campaign Assessment

Current Assignee
Microsoft Technology Licensing LLC (Microsoft Corporation)
Original Assignee
Microsoft Technology Licensing LLC (Microsoft Corporation)
Inventors
Yao, Kaisheng, Gong, Yifan
Primary Examiner(s)
PULLIAS, JESSE SCOTT

Application Number

US13/750,870
Publication Number

US 20140214420A1
Time in Patent Office

1,047 Days
Field of Search

704231-257, 704270-275
US Class Current

1/1
CPC Class Codes

G10L 15/063 Training

G10L 2015/0635 updating or merging of old ...

Feature space transformation for personalization using generalized i-vector clustering

First Claim

3 Assignments

0 Petitions

Accused Products

Abstract

23 Citations

19 Claims

Specification

Solutions

Use Cases

Quick Links

Feature space transformation for personalization using generalized i-vector clustering

First Claim

3 Assignments

Subscription Required

Subscription Required

0 Petitions

Subscription Required

Accused Products

Subscription Required

Abstract

23 Citations

19 Claims

Specification

Subscription Required

Solutions

Use Cases

Quick Links