Learning Student DNN Via Output Distribution

US 20160078339A1
Filed: 09/14/2015
Published: 03/17/2016
Est. Priority Date: 09/12/2014
Status: Active Grant

First Claim

Patent Images

1. One or more computer-readable media having computer-executable instructions embodied thereon that, when executed by a computing system having a processor and memory, cause the computing system to perform a method for generating a DNN classifier for deployment on a computing device, the method comprising:

determining a first DNN model as a teacher DNN model;

initializing a second DNN model as a student DNN model;

receiving a set of un-labeled training data;

for a number of iterations;

(a) using a subset of the set of training data, determine a teacher output distribution for the teacher DNN model and a student output distribution for the student DNN model;

(b) determine an evaluation of the student output distribution vs. the teacher output distribution;

(c) based on the evaluation, update the student DNN model; and

providing the student DNN model as a trained DNN classifier,wherein the number of iterations is based on the determined evaluation.

View all claims

1 Assignment

Timeline View

Assignment View

0 Petitions

Accused Products

Abstract

Systems and methods are provided for generating a DNN classifier by “learning” a “student” DNN model from a larger more accurate “teacher” DNN model. The student DNN may be trained from un-labeled training data because its supervised signal is obtained by passing the un-labeled training data through the teacher DNN. In one embodiment, an iterative process is applied to train the student DNN by minimize the divergence of the output distributions from the teacher and student DNN models. For each iteration until convergence, the difference in the output distributions is used to update the student DNN model, and output distributions are determined again, using the unlabeled training data. The resulting trained student model may be suitable for providing accurate signal processing applications on devices having limited computational or storage resources such as mobile or wearable devices. In an embodiment, the teacher DNN model comprises an ensemble of DNN models.

89 Citations

View as Search Results

20 Claims

1. One or more computer-readable media having computer-executable instructions embodied thereon that, when executed by a computing system having a processor and memory, cause the computing system to perform a method for generating a DNN classifier for deployment on a computing device, the method comprising:
- determining a first DNN model as a teacher DNN model;
  
  initializing a second DNN model as a student DNN model;
  
  receiving a set of un-labeled training data;
  
  for a number of iterations;
  
  (a) using a subset of the set of training data, determine a teacher output distribution for the teacher DNN model and a student output distribution for the student DNN model;
  
  (b) determine an evaluation of the student output distribution vs. the teacher output distribution;
  
  (c) based on the evaluation, update the student DNN model; and
  
  providing the student DNN model as a trained DNN classifier,wherein the number of iterations is based on the determined evaluation.
- View Dependent Claims (2, 3, 4, 5, 6, 7, 8, 9, 10, 11)
- - 2. The one or more computer-readable media of claim 1, wherein determining an evaluation of the student output distribution vs. the teacher output distribution comprises determining convergence between the student output distribution and the teacher output distribution, and wherein the number of iterations is the number of time steps (a) through (c) are performed until the convergence is determined.
  - 3. The one or more computer-readable media of claim 1, wherein Kullback-Leibler divergence is used to determine an evaluation of the student output distribution vs. the teacher output distribution, and wherein the determined evaluation comprises an error signal.
  - 4. The one or more computer-readable media of claim 3, wherein the student DNN model is updated using back-propagation based on the error signal.
  - 5. The one or more computer-readable media of claim 1, wherein the teacher output distribution and the student output distribution are determined by forward propagation using the subset of data.
  - 6. The one or more computer-readable media of claim 1, wherein the first DNN model is determined from an already trained DNN model.
  - 7. The one or more computer-readable media of claim 1, wherein the first DNN model comprises an ensemble DNN model.
  - 8. The one or more computer-readable media of claim 1, wherein second DNN model is initialized based on the first DNN model, and wherein the second DNN model is pre-trained.
  - 9. The one or more computer-readable media of claim 1, wherein second DNN model is a CD-DNN-HMM.
  - 10. The one or more computer-readable media of claim 1, wherein the subset of the set of training data comprises a mini-batch, and wherein a different mini-batch of data is use for each iteration of the number of iterations until all of the set of training data has been used.
  - 11. A DNN classifier deployed on a client device and generated from executing, by the computing system, the computer-executable instructions of claim 1.

12. A computer implemented method for generating a trained DNN model for deployment as a classifier on a computer system, the method comprising:
- determining a plurality of DNN models to be included as sub-DNNs in an ensemble DNN model;
  
  assembling the ensemble DNN model using the sub-DNNs, thereby making each of plurality of sub-DNNs an ensemble member;
  
  training the ensemble DNN model;
  
  initializing a student DNN model;
  
  training the student DNN model, using the trained ensemble DNN model as a teacher DNN; and
  
  providing the student DNN model as a DNN classifier.
- View Dependent Claims (13, 14, 15, 16, 17, 18)
- - 13. The computer-implemented method of claim 12, wherein the plurality of DNN models to be included as sub-DNNs in an ensemble DNN model are determined based an intended application for the classifier deployed on the computer system.
  - 14. The computer-implemented method of claim 12, wherein the sub-DNNs comprise DNN models that (a) have different nonlinear units, (b) have different structure types, (c) are trained according to different training strategies, (d) have different topologies, or (e) are trained with different data.
  - 15. The computer-implemented method of claim 12, wherein training the ensemble DNN model comprises combining output distributions of the ensemble members by a learned combination of coefficients using cross-entropy criterion, sequential criterion, least square error criterion, or least square error criterion with nonnegative constraint.
  - 16. The computer-implemented method of claim 12, wherein training the student DNN model comprises:
    - (a) receiving a mini-batch of un-labeled training data;
      
      (b) using a mini-batch, determining a teacher output distribution for the teacher DNN model and a student output distribution for the student DNN model by forward propagation of the mini-batch in the student DNN model and the teacher DNN model;
      
      (c) determining an evaluation of the student output distribution vs. the teacher output distribution;
      
      (d) based on the evaluation, determining whether the student output distribution and the teacher output distribution have achieved convergence;
      
      (i) if the student output distribution and the teacher output distribution are determined to have converged, then providing the student DNN model for deployment on the client device; and
      
      (ii) if the student output distribution and the teacher output distribution are determined not to have converged, then updating the student DNN model based on the determined evaluation and repeating steps (a) through (d).
  - 17. The computer-implemented method of claim 16, wherein Kullback-Leibler divergence is used to determine the evaluation of the student output distribution vs. the teacher output distribution, wherein the determined evaluation comprises an error signal, and wherein the student DNN model is updated using back-propagation based on the error signal.
  - 18. The computer-implemented method of claim 16, wherein the student DNN model is a CD-DNN-HMM, and wherein the mini-batch that is received in step (a) comprises a subset of training data that has not already been used in step (b).

19. A DNN-based classifier deployed on a client device, the DNN-based classifier created according to a process comprising:
- (a) determining a first DNN model as a teacher DNN model;
  
  (b) initializing a second DNN model as a student DNN model;
  
  (c) receiving a set of un-labeled training data;
  
  (d) using a subset from the set of training data, determining a teacher output distribution for the teacher DNN model and a student output distribution for the student DNN model;
  
  (e) determining an evaluation of the student output distribution vs. the teacher output distribution;
  
  (e) based on the evaluation, determining whether the student output distribution and the teacher output distribution have achieved convergence;
  
  (i) if the student output distribution and the teacher output distribution are determined to have converged, then providing the student DNN model for deployment on the client device; and
  
  (ii) if the student output distribution and the teacher output distribution are determined not to have converged, then updating the student DNN model based on the determined evaluation and repeating steps (d) through (f).

20. The DNN-based classifier created according to the process of claim 20, wherein the student DNN model is a CD-DNN-HMM, wherein Kullback-Leibler divergence is used to determine the evaluation of the student output distribution vs. the teacher output distribution, wherein the determined evaluation comprises an error signal, wherein the student DNN model is updated using back-propagation based on the error signal, and wherein the DNN-based classifier is deployed on the client device as part of an automatic speech recognition system.

Specification

Resources

Litigation Campaign Assessment

Current Assignee
Microsoft Technology Licensing LLC (Microsoft Corporation)
Original Assignee
Microsoft Technology Licensing LLC (Microsoft Corporation)
Inventors
Huang, Jui-Ting, Gong, Yifan, Li, Jinyu, Zhao, Rui

Granted Patent

US 11,429,860 B2
Time in Patent Office

Days
Field of Search
US Class Current

1/1
CPC Class Codes

G06N 20/00   Machine learning

G06N 3/045   Combinations of networks

G06N 3/082   modifying the architecture,...

G06N 3/084   Backpropagation, e.g. using...

G06N 7/01   Probabilistic graphical mod...

G09B 5/00   Electrically-operated educa...

Learning Student DNN Via Output Distribution

First Claim

1 Assignment

0 Petitions

Accused Products

Abstract

89 Citations

20 Claims

Specification

Use Cases

Quick Links

Others

Learning Student DNN Via Output Distribution

First Claim

1 Assignment

Subscription Required

Subscription Required

0 Petitions

Subscription Required

Accused Products

Subscription Required

Abstract

89 Citations

20 Claims

Specification

Subscription Required

Use Cases

Quick Links

Others