Multi-task learning using knowledge distillation

US 10,635,977 B2
Filed: 07/01/2019
Issued: 04/28/2020
Est. Priority Date: 12/30/2016
Status: Active Grant

First Claim

Patent Images

1. A computer implemented method comprising:

obtaining a respective set of training data for each of a plurality of machine learning tasks;

for each of the machine learning tasks, configuring a respective teacher machine learning model to perform the machine learning task by training the teacher machine learning model on the training data for the task; and

training a single student machine learning model having a plurality of student machine learning model parameters to perform all of the plurality of machine learning tasks using (i) the configured teacher machine learning models, and (ii) the obtained training data, wherein training the single student machine learning model comprises;

for each of the plurality of machine learning tasks;

selecting one or more subsets from the set of training data for the machine learning task;

processing the selected subsets using the respective teacher machine learning model to generate respective teacher machine learning model outputs; and

training the single student machine learning model to perform the machine learning task using (i) the selected one or more subsets, and (ii) respective generated teacher machine learning model outputs, comprising, for each subset;

augmenting the subset with an identifier for the machine learning task;

processing the augmented subset using the student machine learning model to generate a student machine learning model output; and

adjusting values of the student machine learning model parameters to match the generated student machine learning model output to the respective generated teacher machine learning model output for the subset.

View all claims

1 Assignment

Timeline View

Assignment View

0 Petitions

Accused Products

Abstract

Methods, systems, and apparatus, including computer programs encoded on computer storage media for performing multi-task learning. In one method a system obtains a respective set of training data for each of multiple machine learning tasks. For each of the machine learning tasks, the system configures a respective teacher machine learning model to perform the machine learning task by training the teacher machine learning model on the training data. The system trains a single student machine learning model to perform the multiple machine learning tasks using (i) the configured teacher machine learning models, and (ii) the obtained training data.

Citations

20 Claims

1. A computer implemented method comprising:
- obtaining a respective set of training data for each of a plurality of machine learning tasks;
  
  for each of the machine learning tasks, configuring a respective teacher machine learning model to perform the machine learning task by training the teacher machine learning model on the training data for the task; and
  
  training a single student machine learning model having a plurality of student machine learning model parameters to perform all of the plurality of machine learning tasks using (i) the configured teacher machine learning models, and (ii) the obtained training data, wherein training the single student machine learning model comprises;
  
  for each of the plurality of machine learning tasks;
  
  selecting one or more subsets from the set of training data for the machine learning task;
  
  processing the selected subsets using the respective teacher machine learning model to generate respective teacher machine learning model outputs; and
  
  training the single student machine learning model to perform the machine learning task using (i) the selected one or more subsets, and (ii) respective generated teacher machine learning model outputs, comprising, for each subset;
  
  augmenting the subset with an identifier for the machine learning task;
  
  processing the augmented subset using the student machine learning model to generate a student machine learning model output; and
  
  adjusting values of the student machine learning model parameters to match the generated student machine learning model output to the respective generated teacher machine learning model output for the subset.
- View Dependent Claims (2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12)
- - 2. The method of claim 1, wherein the teacher machine learning model outputs comprise soft target outputs.
  - 3. The method of claim 1, wherein the training data for each of the plurality of machine learning tasks comprises (i) an input text segment in an input language, and (ii) an output text segment in a target language that is different from the input language.
  - 4. The method of claim 3, wherein the plurality of machine learning tasks comprise translating an input text segment in an input language into a target language.
  - 5. The method of claim 4, wherein augmenting the subset with an identifier for the machine learning task comprises prepending each input text segment with a token identifying at least the target language.
  - 6. The method of claim 3, wherein selecting one or more subsets from the set of training data for the machine learning task comprises selecting one or more sub-word units from the input text segment.
  - 7. The method of claim 6, wherein each generated respective teacher machine learning model output comprises a probability distribution indicating a respective translation of the corresponding sub-word unit.
  - 8. The method of claim 3, wherein the training data comprises an equal distribution of text segments in different languages.
  - 9. The method of claim 1, wherein augmenting the subset with an identifier for the machine learning task comprises prepending the subset with a token identifier for the machine learning task.
  - 10. The method of claim 1, wherein the student machine learning model is smaller in size than the teacher machine learning models.
  - 11. The method of claim 1, wherein the student machine learning model is larger in size or the same size as the teacher machine learning models.
  - 12. The method of claim 1, wherein the size of each of the teacher machine learning models is independent of the student machine learning model.

13. A system comprising one or more computers and one or more storage devices storing instructions that are operable, when executed by the one or more computers, to cause the one or more computers to perform operations comprising:
- obtaining a respective set of training data for each of a plurality of machine learning tasks;
  
  for each of the machine learning tasks, configuring a respective teacher machine learning model to perform the machine learning task by training the teacher machine learning model on the training data for the task; and
  
  training a single student machine learning model having a plurality of student machine learning model parameters to perform all of the plurality of machine learning tasks using (i) the configured teacher machine learning models, and (ii) the obtained training data, wherein training the single student machine learning model comprises;
  
  for each of the plurality of machine learning tasks;
  
  selecting one or more subsets from the set of training data for the machine learning task;
  
  processing the selected subsets using the respective teacher machine learning model to generate respective teacher machine learning model outputs; and
  
  training the single student machine learning model to perform the machine learning task using (i) the selected one or more subsets, and (ii) respective generated teacher machine learning model outputs, comprising, for each subset;
  
  augmenting the subset with an identifier for the machine learning task;
  
  processing the augmented subset using the student machine learning model to generate a student machine learning model output; and
  
  adjusting values of the student machine learning model parameters to match the generated student machine learning model output to the respective generated teacher machine learning model output for the subset.
- View Dependent Claims (14, 15)
- - 14. The system of claim 13, wherein the teacher machine learning model outputs comprise soft target outputs.
  - 15. The system of claim 13, wherein the training data for each of the plurality of machine learning tasks comprises (i) an input text segment in an input language, and (ii) an output text segment in a target language that is different from the input language.

16. One or more non-transitory computer-readable storage media encoded with instructions that, when executed by one or more computers, cause the one or more computers to perform operations comprising:
- obtaining a respective set of training data for each of a plurality of machine learning tasks;
  
  for each of the machine learning tasks, configuring a respective teacher machine learning model to perform the machine learning task by training the teacher machine learning model on the training data for the task; and
  
  training a single student machine learning model having a plurality of student machine learning model parameters to perform all of the plurality of machine learning tasks using (i) the configured teacher machine learning models, and (ii) the obtained training data, wherein training the single student machine learning model comprises;
  
  for each of the plurality of machine learning tasks;
  
  selecting one or more subsets from the set of training data for the machine learning task;
  
  processing the selected subsets using the respective teacher machine learning model to generate respective teacher machine learning model outputs; and
  
  training the single student machine learning model to perform the machine learning task using (i) the selected one or more subsets, and (ii) respective generated teacher machine learning model outputs, comprising, for each subset;
  
  augmenting the subset with an identifier for the machine learning task;
  
  processing the augmented subset using the student machine learning model to generate a student machine learning model output; and
  
  adjusting values of the student machine learning model parameters to match the generated student machine learning model output to the respective generated teacher machine learning model output for the subset.
- View Dependent Claims (17, 18, 19, 20)
- - 17. The non-transitory computer-readable media of claim 16, wherein the teacher machine learning model outputs comprise soft target outputs.
  - 18. The non-transitory computer-readable media of claim 16, wherein the training data for each of the plurality of machine learning tasks comprises (i) an input text segment in an input language, and (ii) an output text segment in a target language that is different from the input language.
  - 19. The non-transitory computer-readable media of claim 18, wherein the plurality of machine learning tasks comprise translating an input text segment in an input language into a target language.
  - 20. The non-transitory computer-readable media of claim 19, wherein augmenting the subset with an identifier for the machine learning task comprises prepending each input text segment with a token identifying at least the target language.

Specification

Resources

Litigation Campaign Assessment

Current Assignee
Google LLC (Alphabet Inc.)
Original Assignee
Google LLC (Alphabet Inc.)
Inventors
Chung, Junyoung, Johnson Premkumar, Melvin Jose, Schuster, Michael, Macherey, Wolfgang
Primary Examiner(s)
Vincent, David R

Application Number

US16/458,506
Publication Number

US 20190325308A1
Time in Patent Office

302 Days
Field of Search

706 15, 706 45
US Class Current
CPC Class Codes

G06F 40/216   using statistical methods

G06F 40/44   Statistical methods, e.g. p...

G06F 40/58   Use of machine translation,...

G06N 3/045   Combinations of networks

G06N 3/08   Learning methods

Multi-task learning using knowledge distillation

First Claim

1 Assignment

0 Petitions

Accused Products

Abstract

Citations

20 Claims

Specification

Solutions

Use Cases

Quick Links

Multi-task learning using knowledge distillation

First Claim

1 Assignment

Subscription Required

Subscription Required

0 Petitions

Subscription Required

Accused Products

Subscription Required

Abstract

Citations

20 Claims

Specification

Subscription Required

Solutions

Use Cases

Quick Links