Learning Student DNN Via Output Distribution
First Claim
1. One or more computer-readable media having computer-executable instructions embodied thereon that, when executed by a computing system having a processor and memory, cause the computing system to perform a method for generating a DNN classifier for deployment on a computing device, the method comprising:
- determining a first DNN model as a teacher DNN model;
initializing a second DNN model as a student DNN model;
receiving a set of un-labeled training data;
for a number of iterations;
(a) using a subset of the set of training data, determine a teacher output distribution for the teacher DNN model and a student output distribution for the student DNN model;
(b) determine an evaluation of the student output distribution vs. the teacher output distribution;
(c) based on the evaluation, update the student DNN model; and
providing the student DNN model as a trained DNN classifier,wherein the number of iterations is based on the determined evaluation.
1 Assignment
0 Petitions
Accused Products
Abstract
Systems and methods are provided for generating a DNN classifier by “learning” a “student” DNN model from a larger more accurate “teacher” DNN model. The student DNN may be trained from un-labeled training data because its supervised signal is obtained by passing the un-labeled training data through the teacher DNN. In one embodiment, an iterative process is applied to train the student DNN by minimize the divergence of the output distributions from the teacher and student DNN models. For each iteration until convergence, the difference in the output distributions is used to update the student DNN model, and output distributions are determined again, using the unlabeled training data. The resulting trained student model may be suitable for providing accurate signal processing applications on devices having limited computational or storage resources such as mobile or wearable devices. In an embodiment, the teacher DNN model comprises an ensemble of DNN models.
89 Citations
20 Claims
-
1. One or more computer-readable media having computer-executable instructions embodied thereon that, when executed by a computing system having a processor and memory, cause the computing system to perform a method for generating a DNN classifier for deployment on a computing device, the method comprising:
-
determining a first DNN model as a teacher DNN model; initializing a second DNN model as a student DNN model; receiving a set of un-labeled training data; for a number of iterations; (a) using a subset of the set of training data, determine a teacher output distribution for the teacher DNN model and a student output distribution for the student DNN model; (b) determine an evaluation of the student output distribution vs. the teacher output distribution; (c) based on the evaluation, update the student DNN model; and providing the student DNN model as a trained DNN classifier, wherein the number of iterations is based on the determined evaluation. - View Dependent Claims (2, 3, 4, 5, 6, 7, 8, 9, 10, 11)
-
-
12. A computer implemented method for generating a trained DNN model for deployment as a classifier on a computer system, the method comprising:
-
determining a plurality of DNN models to be included as sub-DNNs in an ensemble DNN model; assembling the ensemble DNN model using the sub-DNNs, thereby making each of plurality of sub-DNNs an ensemble member; training the ensemble DNN model; initializing a student DNN model; training the student DNN model, using the trained ensemble DNN model as a teacher DNN; and providing the student DNN model as a DNN classifier. - View Dependent Claims (13, 14, 15, 16, 17, 18)
-
-
19. A DNN-based classifier deployed on a client device, the DNN-based classifier created according to a process comprising:
-
(a) determining a first DNN model as a teacher DNN model; (b) initializing a second DNN model as a student DNN model; (c) receiving a set of un-labeled training data; (d) using a subset from the set of training data, determining a teacher output distribution for the teacher DNN model and a student output distribution for the student DNN model; (e) determining an evaluation of the student output distribution vs. the teacher output distribution; (e) based on the evaluation, determining whether the student output distribution and the teacher output distribution have achieved convergence; (i) if the student output distribution and the teacher output distribution are determined to have converged, then providing the student DNN model for deployment on the client device; and (ii) if the student output distribution and the teacher output distribution are determined not to have converged, then updating the student DNN model based on the determined evaluation and repeating steps (d) through (f).
-
-
20. The DNN-based classifier created according to the process of claim 20, wherein the student DNN model is a CD-DNN-HMM, wherein Kullback-Leibler divergence is used to determine the evaluation of the student output distribution vs. the teacher output distribution, wherein the determined evaluation comprises an error signal, wherein the student DNN model is updated using back-propagation based on the error signal, and wherein the DNN-based classifier is deployed on the client device as part of an automatic speech recognition system.
Specification