DOMAIN ADAPTATION IN SPEECH RECOGNITION VIA TEACHER-STUDENT LEARNING
First Claim
1. A system providing for adaption of speech recognition models for speech recognition in new domains, comprising:
- a processor; and
a memory storage device including instructions that when executed by the processor enable the system to;
select a teacher model configured for speech recognition of utterances in a source domain;
produce a student model based on the teacher model for speech recognition of utterances in a target domain;
provide source domain utterances to the teacher model to produce teacher posteriors for the source domain utterances;
provide, in parallel to providing the source domain utterances, target domain utterances to the student model to produce student posteriors for the target domain utterances;
determine whether student posteriors converge with the teacher posteriors;
in response to determining that the student posteriors and the teacher posteriors converge, finalize the student model for use in speech recognition in the target domain; and
in response to determining that the that the student posteriors and the teacher posteriors do not converge, update parameters of the student model based on divergences in the student posteriors and the teacher posteriors.
1 Assignment
0 Petitions
Accused Products
Abstract
Improvements in speech recognition in a new domain are provided via the student/teacher training of models for different speech domains. A student model for a new domain is created based on the teacher model trained in an existing domain. The student model is trained in parallel to the operation of the teacher model, with inputs in the new and existing domains respectfully, to develop a neural network that is adapted to recognize speech in the new domain. The data in the new domain may exclude transcription labels but rather are parallelized with the data analyzed in the existing domain analyzed by the teacher model. The outputs from the teacher model are compared with the outputs of the student model and the differences are used to adjust the parameters of the student model to better recognize speech in the second domain.
-
Citations
20 Claims
-
1. A system providing for adaption of speech recognition models for speech recognition in new domains, comprising:
-
a processor; and a memory storage device including instructions that when executed by the processor enable the system to; select a teacher model configured for speech recognition of utterances in a source domain; produce a student model based on the teacher model for speech recognition of utterances in a target domain; provide source domain utterances to the teacher model to produce teacher posteriors for the source domain utterances; provide, in parallel to providing the source domain utterances, target domain utterances to the student model to produce student posteriors for the target domain utterances; determine whether student posteriors converge with the teacher posteriors; in response to determining that the student posteriors and the teacher posteriors converge, finalize the student model for use in speech recognition in the target domain; and in response to determining that the that the student posteriors and the teacher posteriors do not converge, update parameters of the student model based on divergences in the student posteriors and the teacher posteriors. - View Dependent Claims (2, 3, 4, 5, 6, 7)
-
-
8. A method for adaption of speech recognition models for speech recognition in new domains, comprising:
-
receiving a selection of a first speech recognition model adapted for speech recognition of utterances in a first domain; cloning the first speech recognition model to thereby produce a second speech recognition model; providing a first dataset of utterances to the first speech recognition model and a second dataset of utterances to the second speech recognition model, wherein the first dataset includes utterances defined according to the first domain and the second dataset includes parallel utterances to those included in the first dataset that are defined according to a second domain; determining whether posteriors produced by the second speech recognition model from the second dataset converge with posteriors produced by the first speech recognition model from the first dataset; in response to determining that the posteriors converge, finalizing the second speech recognition model for use in speech recognition in the second domain; and in response to determining that the posteriors do not converge, updating parameters of the second speech recognition model based on the posteriors. - View Dependent Claims (9, 10, 11, 12, 13, 14)
-
-
15. A computer readable storage device including instructions that when executed by a processor provide for adaption of speech recognition models for speech recognition in new domains, comprising:
-
receiving a selection of a teacher model adapted for speech recognition of utterances in a source domain; cloning the teacher model to produce a student model; providing utterances according to the source domain to the teacher model in parallel to providing utterances according to a target domain to the student model; determining whether posteriors produced by the student model from the target domain utterances converge with posteriors produced by the teacher model from the source domain utterances; in response to determining that the posteriors converge, finalizing the student model for use in speech recognition in the target domain; and in response to determining that the posteriors do not converge, updating parameters of the student model based on the posteriors. - View Dependent Claims (16, 17, 18, 19, 20)
-
Specification