Distributed speaker adaptation

US 8,805,684 B1
Filed: 10/17/2012
Issued: 08/12/2014
Est. Priority Date: 05/31/2012
Status: Active Grant

First Claim

Patent Images

1. A method comprising:

performing, by a user device, automatic speech recognition (ASR) on received utterances, wherein performing the ASR includes;

generating feature vectors based on the utterances, updating the feature vectors based on feature-space speaker adaptation parameters, transcribing the utterances to text strings, wherein the transcriptions are based at least in part on an acoustic model and the updated feature vectors, and updating the feature-space speaker adaptation parameters based on the feature vectors;

transmitting, by the user device, a representation of at least some of the utterances to a computing device for development of an updated acoustic model;

after transmitting the representation, receiving, by the user device, the updated acoustic model from the computing device, wherein the updated acoustic model is based on the representation; and

replacing, by the user device, the acoustic model with the updated acoustic model, wherein the feature-space speaker adaptation parameters are updated more frequently than the acoustic model is updated, and wherein the acoustic model is updated when the computing device has received a threshold extent of the representations from the user device.

View all claims

2 Assignments

Timeline View

Assignment View

0 Petitions

Accused Products

Abstract

Automatic speech recognition (ASR) may be performed on received utterances. The ASR may be performed by an ASR module of a computing device (e.g., a client device). The ASR may include: generating feature vectors based on the utterances, updating the feature vectors based on feature-space speaker adaptation parameters, transcribing the utterances to text strings, and updating the feature-space speaker adaptation parameters based on the feature vectors. The transcriptions may be based, at least in part, on an acoustic model and the updated feature vectors. Updated speaker adaptation parameters may be received from another computing device and incorporated into the ASR module.

168 Citations

22 Claims

1. A method comprising:
- performing, by a user device, automatic speech recognition (ASR) on received utterances, wherein performing the ASR includes;
  
  generating feature vectors based on the utterances, updating the feature vectors based on feature-space speaker adaptation parameters, transcribing the utterances to text strings, wherein the transcriptions are based at least in part on an acoustic model and the updated feature vectors, and updating the feature-space speaker adaptation parameters based on the feature vectors;
  
  transmitting, by the user device, a representation of at least some of the utterances to a computing device for development of an updated acoustic model;
  
  after transmitting the representation, receiving, by the user device, the updated acoustic model from the computing device, wherein the updated acoustic model is based on the representation; and
  
  replacing, by the user device, the acoustic model with the updated acoustic model, wherein the feature-space speaker adaptation parameters are updated more frequently than the acoustic model is updated, and wherein the acoustic model is updated when the computing device has received a threshold extent of the representations from the user device.
- View Dependent Claims (2, 3, 4, 5, 6)
- - 2. The method of claim 1, further comprising:
    - receiving, by the user device, new feature-space speaker adaptation parameters; and
      
      replacing, by the user device, the updated feature-space speaker adaptation parameters with the new feature-space speaker adaptation parameters.
  - 3. The method of claim 2, wherein the feature-space speaker adaptation parameters include a first diagonal matrix, wherein updating the feature vectors based on the feature-space speaker adaptation parameters comprises applying the first diagonal matrix to the feature vectors, and wherein the updated feature-space speaker adaptation parameters include a second diagonal matrix, the method further comprising:
    - applying the second diagonal matrix to a first subsequently-received feature vector.
  - 4. The method of claim 3, wherein the new feature-space speaker adaptation parameters include a first non-diagonal matrix, and wherein replacing the updated feature-space speaker adaptation parameters with the new feature-space speaker adaptation parameters comprises replacing the second diagonal matrix with the first non-diagonal matrix.
  - 5. The method of claim 4, further comprising:
    - after replacing the second diagonal matrix with the first non-diagonal matrix, applying the non-diagonal matrix to a second subsequently-received feature vector; and
      
      updating the non-diagonal matrix based on the second subsequently-received feature vector.
  - 6. The method of claim 1, wherein transmitting the representation to the computing device comprises transmitting the at least some of the feature vectors to the computing device.

7. An article of manufacture including a non-transitory computer-readable storage medium, having stored thereon program instructions that, upon execution by a user device, cause the user device to perform operations comprising:
- performing automatic speech recognition (ASR) on received utterances, wherein performing the ASR includes;
  
  generating feature vectors based on the utterances, updating the feature vectors based on feature-space speaker adaptation parameters, transcribing the utterances to text strings, wherein the transcriptions are based at least in part on an acoustic model and the updated feature vectors, and updating the feature-space speaker adaptation parameters based on the feature vectors;
  
  transmitting a representation of at least some of the utterances to a computing device for development of an updated acoustic model;
  
  after transmitting the representation, receiving the updated acoustic model from the computing device, wherein the updated acoustic model is based on the representation; and
  
  replacing, by the user device, the acoustic model with the updated acoustic model, wherein the feature-space speaker adaptation parameters are updated more frequently than the acoustic model is updated, and wherein the acoustic model is updated when the computing device has received a threshold extent of the representations from the user device.
- View Dependent Claims (8, 9, 10, 11, 12)
- - 8. The article of manufacture of claim 7, wherein the operations further comprise:
    - receiving new feature-space speaker adaptation parameters; and
      
      replacing the updated feature-space speaker adaptation parameters with the new feature-space speaker adaptation parameters.
  - 9. The article of manufacture of claim 8, wherein the feature-space speaker adaptation parameters include a first diagonal matrix, wherein updating the feature vectors based on the feature-space speaker adaptation parameters comprises applying the first diagonal matrix to the feature vectors, and wherein the updated feature-space speaker adaptation parameters include a second diagonal matrix, the operations further comprising:
    - applying the second diagonal matrix to a first subsequently-received feature vector.
  - 10. The article of manufacture of claim 9, wherein the new feature-space speaker adaptation parameters include a first non-diagonal matrix, and wherein replacing the updated feature-space speaker adaptation parameters with the new feature-space speaker adaptation parameters comprises replacing the second diagonal matrix with the first non-diagonal matrix.
  - 11. The article of manufacture of claim 10, the operations further comprising:
    - after replacing the second diagonal matrix with the first non-diagonal matrix, applying the non-diagonal matrix to a second subsequently-received feature vector; and
      
      updating the non-diagonal matrix based on the second subsequently-received feature vector.
  - 12. The article of manufacture of claim 7, wherein transmitting the representation to the computing device comprises transmitting the at least some of the feature vectors to the computing device.

13. A computing device comprising:
- an automatic speech recognition (ASR) module configured to generate feature vectors based on received utterances, update the feature vectors based on feature-space speaker adaptation parameters, obtain transcriptions of the utterances to text strings, wherein the transcriptions are based at least in part on an acoustic model and the updated feature vectors, and update the feature-space speaker adaptation parameters based on the feature vectors; and
  
  a communication module configured to transmit a representation of at least some of the utterances to a server device for development of an updated acoustic model, after transmitting the representation, receive the updated acoustic model from the server device, wherein the updated acoustic model is based on the representation, andwherein the ASR module is further configured to replace the acoustic model with the updated acoustic model, wherein the feature-space speaker adaptation parameters are updated more frequently than the acoustic model is updated, and wherein the acoustic model is updated when the server device has received a threshold extent of the representations from the computing device.
- View Dependent Claims (14, 15, 16, 17)
- - 14. The computing device of claim 13, wherein the communication module is further configured to receive new feature-space speaker adaptation parameters, and wherein the ASR module is further configured to replace the updated feature-space speaker adaptation parameters with the new feature-space speaker adaptation parameters.
  - 15. The computing device of claim 14, wherein the feature-space speaker adaptation parameters include a first diagonal matrix, wherein updating the feature vectors based on the feature-space speaker adaptation parameters comprises applying the first diagonal matrix to the feature vectors, wherein the updated feature-space speaker adaptation parameters include a second diagonal matrix, and wherein the ASR module is further configured to apply the second diagonal matrix to a first subsequently-received feature vector.
  - 16. The computing device of claim 15, wherein the new feature-space speaker adaptation parameters include a first non-diagonal matrix, and wherein replacing the updated feature-space speaker adaptation parameters with the new feature-space speaker adaptation parameters comprises replacing the second diagonal matrix with the first non-diagonal matrix.
  - 17. The computing device of claim 16, wherein the ASR module is further configured to, after replacing the second diagonal matrix with the first non-diagonal matrix, apply the non-diagonal matrix to a second subsequently-received feature vector, and update the non-diagonal matrix based on the second subsequently-received feature vector.

18. A system comprising:
- a user device, including an automatic speech recognition (ASR) module configured to perform ASR, the ASR including;
  
  generating feature vectors based on the utterances, updating the feature vectors based on feature-space speaker adaptation parameters, transcribing the utterances to text strings, wherein the transcriptions are based at least in part on an acoustic model and the updated feature vectors, and updating the feature-space speaker adaptation parameters based on the feature vectors; and
  
  a computing device that receives representations of one or more of the utterances transcribed by the user device for development of an updated acoustic model, and after receiving the representations transmits the updated acoustic model, based on the representations, to the user device, wherein the user device replaces the acoustic model with the updated acoustic model, wherein the feature-space speaker adaptation parameters are updated more frequently than the acoustic model is updated, and wherein the acoustic model is updated when the computing device has received a threshold extent of the representations from the user device.
- View Dependent Claims (19, 20, 21, 22)
- - 19. The system of claim 18, wherein the user device is further configured to receive new feature-space speaker adaptation parameters, and wherein the ASR module is further configured to replace the updated feature-space speaker adaptation parameters with the new feature-space speaker adaptation parameters.
  - 20. The system of claim 19, wherein the feature-space speaker adaptation parameters include a first diagonal matrix, wherein updating the feature vectors based on the feature-space speaker adaptation parameters comprises applying the first diagonal matrix to the feature vectors, wherein the updated feature-space speaker adaptation parameters include a second diagonal matrix, and wherein the ASR module is further configured to apply the second diagonal matrix to a first subsequently-received feature vector.
  - 21. The system of claim 20, wherein the new feature-space speaker adaptation parameters include a first non-diagonal matrix, and wherein replacing the updated feature-space speaker adaptation parameters with the new feature-space speaker adaptation parameters comprises replacing the second diagonal matrix with the first non-diagonal matrix.
  - 22. The system of claim 21, wherein the ASR module is further configured to, after replacing the second diagonal matrix with the first non-diagonal matrix, apply the non-diagonal matrix to a second subsequently-received feature vector, and update the non-diagonal matrix based on the second subsequently-received feature vector.

Specification

Resources

Litigation Campaign Assessment

Current Assignee
Google LLC (Alphabet Inc.)
Original Assignee
Google Inc. (Alphabet Inc.)
Inventors
Aleksic, Petar, Lei, Xin
Primary Examiner(s)
BAKER, MATTHEW H

Application Number

US13/653,804
Time in Patent Office

664 Days
Field of Search

704/233, 704/234, 704/235, 704/244, 704/245, 704/255
US Class Current

704/244
CPC Class Codes

G10L 15/07 to the speaker

Distributed speaker adaptation

First Claim

2 Assignments

0 Petitions

Accused Products

Abstract

168 Citations

22 Claims

Specification

Solutions

Use Cases

Quick Links

Distributed speaker adaptation

First Claim

2 Assignments

Subscription Required

Subscription Required

0 Petitions

Subscription Required

Accused Products

Subscription Required

Abstract

168 Citations

22 Claims

Specification

Subscription Required

Solutions

Use Cases

Quick Links