Distributed speaker adaptation
First Claim
Patent Images
1. A method comprising:
- performing, by a user device, automatic speech recognition (ASR) on received utterances, wherein performing the ASR includes;
generating feature vectors based on the utterances, updating the feature vectors based on feature-space speaker adaptation parameters, transcribing the utterances to text strings, wherein the transcriptions are based at least in part on an acoustic model and the updated feature vectors, and updating the feature-space speaker adaptation parameters based on the feature vectors;
transmitting, by the user device, a representation of at least some of the utterances to a computing device for development of an updated acoustic model;
after transmitting the representation, receiving, by the user device, the updated acoustic model from the computing device, wherein the updated acoustic model is based on the representation; and
replacing, by the user device, the acoustic model with the updated acoustic model, wherein the feature-space speaker adaptation parameters are updated more frequently than the acoustic model is updated, and wherein the acoustic model is updated when the computing device has received a threshold extent of the representations from the user device.
2 Assignments
0 Petitions
Accused Products
Abstract
Automatic speech recognition (ASR) may be performed on received utterances. The ASR may be performed by an ASR module of a computing device (e.g., a client device). The ASR may include: generating feature vectors based on the utterances, updating the feature vectors based on feature-space speaker adaptation parameters, transcribing the utterances to text strings, and updating the feature-space speaker adaptation parameters based on the feature vectors. The transcriptions may be based, at least in part, on an acoustic model and the updated feature vectors. Updated speaker adaptation parameters may be received from another computing device and incorporated into the ASR module.
168 Citations
22 Claims
-
1. A method comprising:
-
performing, by a user device, automatic speech recognition (ASR) on received utterances, wherein performing the ASR includes;
generating feature vectors based on the utterances, updating the feature vectors based on feature-space speaker adaptation parameters, transcribing the utterances to text strings, wherein the transcriptions are based at least in part on an acoustic model and the updated feature vectors, and updating the feature-space speaker adaptation parameters based on the feature vectors;transmitting, by the user device, a representation of at least some of the utterances to a computing device for development of an updated acoustic model; after transmitting the representation, receiving, by the user device, the updated acoustic model from the computing device, wherein the updated acoustic model is based on the representation; and replacing, by the user device, the acoustic model with the updated acoustic model, wherein the feature-space speaker adaptation parameters are updated more frequently than the acoustic model is updated, and wherein the acoustic model is updated when the computing device has received a threshold extent of the representations from the user device. - View Dependent Claims (2, 3, 4, 5, 6)
-
-
7. An article of manufacture including a non-transitory computer-readable storage medium, having stored thereon program instructions that, upon execution by a user device, cause the user device to perform operations comprising:
-
performing automatic speech recognition (ASR) on received utterances, wherein performing the ASR includes;
generating feature vectors based on the utterances, updating the feature vectors based on feature-space speaker adaptation parameters, transcribing the utterances to text strings, wherein the transcriptions are based at least in part on an acoustic model and the updated feature vectors, and updating the feature-space speaker adaptation parameters based on the feature vectors;transmitting a representation of at least some of the utterances to a computing device for development of an updated acoustic model; after transmitting the representation, receiving the updated acoustic model from the computing device, wherein the updated acoustic model is based on the representation; and replacing, by the user device, the acoustic model with the updated acoustic model, wherein the feature-space speaker adaptation parameters are updated more frequently than the acoustic model is updated, and wherein the acoustic model is updated when the computing device has received a threshold extent of the representations from the user device. - View Dependent Claims (8, 9, 10, 11, 12)
-
-
13. A computing device comprising:
-
an automatic speech recognition (ASR) module configured to generate feature vectors based on received utterances, update the feature vectors based on feature-space speaker adaptation parameters, obtain transcriptions of the utterances to text strings, wherein the transcriptions are based at least in part on an acoustic model and the updated feature vectors, and update the feature-space speaker adaptation parameters based on the feature vectors; and a communication module configured to transmit a representation of at least some of the utterances to a server device for development of an updated acoustic model, after transmitting the representation, receive the updated acoustic model from the server device, wherein the updated acoustic model is based on the representation, and wherein the ASR module is further configured to replace the acoustic model with the updated acoustic model, wherein the feature-space speaker adaptation parameters are updated more frequently than the acoustic model is updated, and wherein the acoustic model is updated when the server device has received a threshold extent of the representations from the computing device. - View Dependent Claims (14, 15, 16, 17)
-
-
18. A system comprising:
-
a user device, including an automatic speech recognition (ASR) module configured to perform ASR, the ASR including;
generating feature vectors based on the utterances, updating the feature vectors based on feature-space speaker adaptation parameters, transcribing the utterances to text strings, wherein the transcriptions are based at least in part on an acoustic model and the updated feature vectors, and updating the feature-space speaker adaptation parameters based on the feature vectors; anda computing device that receives representations of one or more of the utterances transcribed by the user device for development of an updated acoustic model, and after receiving the representations transmits the updated acoustic model, based on the representations, to the user device, wherein the user device replaces the acoustic model with the updated acoustic model, wherein the feature-space speaker adaptation parameters are updated more frequently than the acoustic model is updated, and wherein the acoustic model is updated when the computing device has received a threshold extent of the representations from the user device. - View Dependent Claims (19, 20, 21, 22)
-
Specification