PARALLELIZING THE TRAINING OF CONVOLUTIONAL NEURAL NETWORKS

US 20150294219A1
Filed: 04/10/2015
Published: 10/15/2015
Est. Priority Date: 04/11/2014
Status: Active Grant

First Claim

Patent Images

1. A system for training a convolutional neural network on a plurality of batches of training examples, the convolutional neural network having a plurality of layers arranged in a sequence from lowest to highest, the sequence including one or more convolutional layers followed by one or more fully-connected layers, each convolutional layer and each fully-connected layer comprising a respective plurality of nodes, the system comprising:

a plurality of workers, wherein each worker is configured to maintain a respective replica of each of the convolutional layers and a respective disjoint partition of each of the fully-connected layers, wherein each replica of a convolutional layer includes all of the nodes in the convolutional layer, wherein each disjoint partition of a fully-connected layer includes a portion of the nodes of the fully-connected layer, and wherein each worker is configured to perform operations comprising;

receiving a batch of training examples assigned to the worker, wherein the batches of training examples are assigned such that each worker receives a respective batch of the plurality of batches;

training the convolutional layer replica maintained by the worker on the batch of training examples assigned to the worker; and

training the fully-connected layer partitions maintained by the worker on each of the plurality of batches of training examples.

View all claims

2 Assignments

Timeline View

Assignment View

0 Petitions

Accused Products

Abstract

Methods, systems, and apparatus, including computer programs encoded on computer storage media, for training a convolutional neural network (CNN). The system includes a plurality of workers, wherein each worker is configured to maintain a respective replica of each of the convolutional layers of the CNN and a respective disjoint partition of each of the fully-connected layers of the CNN, wherein each replica of a convolutional layer includes all of the nodes in the convolutional layer, and wherein each disjoint partition of a fully-connected layer includes a portion of the nodes of the fully-connected layer.

30 Citations

View as Search Results

20 Claims

1. A system for training a convolutional neural network on a plurality of batches of training examples, the convolutional neural network having a plurality of layers arranged in a sequence from lowest to highest, the sequence including one or more convolutional layers followed by one or more fully-connected layers, each convolutional layer and each fully-connected layer comprising a respective plurality of nodes, the system comprising:
- a plurality of workers, wherein each worker is configured to maintain a respective replica of each of the convolutional layers and a respective disjoint partition of each of the fully-connected layers, wherein each replica of a convolutional layer includes all of the nodes in the convolutional layer, wherein each disjoint partition of a fully-connected layer includes a portion of the nodes of the fully-connected layer, and wherein each worker is configured to perform operations comprising;
  
  receiving a batch of training examples assigned to the worker, wherein the batches of training examples are assigned such that each worker receives a respective batch of the plurality of batches;
  
  training the convolutional layer replica maintained by the worker on the batch of training examples assigned to the worker; and
  
  training the fully-connected layer partitions maintained by the worker on each of the plurality of batches of training examples.
- View Dependent Claims (2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13)
- - 2. The system of claim 1, wherein training the convolutional layer replica maintained by the worker on the batch of training examples assigned to the worker comprises:
    - processing the batch of training examples through all of the convolutional layer replicas maintained by the worker to compute respective convolutional activations for each training example in the batch assigned to the worker;
      
      obtaining gradient data for a highest fully-connected layer replica maintained by the worker for each of the training examples in the batch assigned to the worker; and
      
      backpropagating the highest-fully connected layer replica gradient data through the remaining convolutional layer replicas maintained by the worker.
  - 3. The system of claim 2, wherein training the fully-connected layer partitions maintained by the worker on each of the plurality of batches of training examples comprises:
    - obtaining respective convolutional data for each of the training examples in all of the plurality of batches, the convolutional data for each of the training examples comprising convolutional activations for the training example;
      
      processing the convolutional data through a partition of a lowest fully-connected layer maintained by the worker to generate own initial fully-connected layer partition activations for the corresponding training examples;
      
      sending the own initial fully-connected layer partition activations to other workers in the plurality of workers;
      
      receiving other initial fully-connected layer partition activations from other workers in the plurality of workers;
      
      for each other fully-connected layer partition maintained by the worker;
      
      processing own partition activation data and other partition activation data for a next lowest fully-connected layer partition maintained by the worker through the fully-connected layer partition to generate own partition activation data for the fully-connected layer partition,sending the own partition activation data for the fully-connected layer partition to other workers in the plurality of workers, andreceiving other partition activation data for the fully-connected layer partition from other workers in the plurality of workers;
      
      receiving own highest fully-connected layer partition gradient data for each of the training examples;
      
      sending the own highest fully-connected layer partition gradient data to other workers in the plurality of workers;
      
      receiving other highest fully-connected layer partition gradient data from other workers in the plurality of workers; and
      
      for each other fully-connected layer partition maintained by the worker;
      
      backpropagating own partition gradient data and other partition gradient data for a next highest fully-connected layer partition maintained by the worker through the fully-connected layer portion to generate own gradient partition data for the fully-connected layer partition,sending the own partition gradient data for the fully-connected layer partition to other workers in the plurality of workers, andreceiving other partition gradient data for the fully-connected layer partition from other workers in the plurality of workers.
  - 4. The system of claim 3, wherein obtaining gradient data for a highest fully-connected layer replica maintained by the worker for each of the training examples in the batch assigned to the worker comprises:
    - computing own gradient data for a portion of the highest convolutional layer replica maintained by the worker for each training example;
      
      sending own highest-fully connected layer replica gradient portion data to other workers in the plurality of workers; and
      
      receiving other highest-fully connected layer replica gradient portion from other workers in the plurality of workers.
  - 5. The system of claim 4, wherein obtaining respective convolutional data for each of the training examples in all of the plurality of batches comprises:
    - sending own convolutional data to other workers in the plurality of workers, the own convolutional data comprising the respective convolutional activations for training examples assigned to the worker; and
      
      receiving other convolutional data from other workers in the plurality of workers, the other convolutional data comprising respective convolutional activations for other training examples assigned to other workers.
  - 6. The system of claim 5, wherein sending own convolutional data to other workers comprises sending own convolutional data for each training example in the batch assigned to the worker to each other worker in the plurality of workers, and wherein receiving other convolutional data comprises receiving other convolutional data for each other training example in each other batch in the plurality of batches.
  - 7. The system of claim 6, wherein sending own highest-fully connected layer replica gradient portion data to other workers comprises sending own highest-fully connected layer replica gradient portion data for other training example in each other batch in the plurality of batches to the worker to which the training example is assigned, and wherein receiving other highest-fully connected layer replica gradient portion data from other workers in the plurality of workers comprises receiving, from each other worker, other highest-fully connected layer replica gradient portion data for each training example in the batch assigned to the worker.
  - 8. The system of claim 5, wherein sending own convolutional data to other workers comprises sending own convolutional data for each training example in the batch assigned to the worker to each other worker in the plurality of workers, and wherein receiving other convolutional data comprises receiving other convolutional data for each other training example in a batch assigned to a particular other worker in parallel with processing the own convolutional data.
  - 9. The system of claim 8, wherein receiving other highest-fully connected layer replica gradient portion data from other workers in the plurality of workers comprises receiving, from each other worker, other highest-fully connected layer replica gradient portion data for each training example in the batch assigned to the worker, and wherein processing the other convolutional data comprises processing the other convolutional data comprises processing the other convolutional data for each other training example in the batch assigned to the particular other worker in parallel with receiving the other highest-fully connected layer replica gradient portion data.
  - 10. The system of claim 5, wherein sending own convolutional data to other workers comprises sending own convolutional data for a pre-determined number of training examples in the batch assigned to the worker to each other worker in the plurality of workers, and wherein receiving other convolutional data comprises receiving other convolutional data for the pre-determined number of training examples in each other batch in the plurality of batches.
  - 11. The system of claim 10, wherein processing the own convolutional data and the other convolutional data comprises processing the own convolutional data and the other convolutional data in parallel with sending own convolutional data for another pre-determined number of training examples in the batch assigned to the worker and receiving other convolutional data for another pre-determined number of training examples in each other batch in the plurality of batches.
  - 12. The system of claim 11, wherein sending own highest-fully connected layer replica gradient portion data to other workers comprises sending own highest-fully connected layer replica gradient portion data for the pre-determined number of other training examples in each other batch in the plurality of batches to the worker to which the training example is assigned, wherein receiving other highest-fully connected layer replica gradient portion data from other workers in the plurality of workers comprises receiving, from each other worker, other highest-fully connected layer replica gradient portion data for the pre-determined number of training examples, and wherein processing the own convolutional data and the other convolutional data comprises processing the own convolutional data and the other convolutional data for the other pre-determined number of training examples in parallel with sending the own highest-fully connected layer replica gradient portion data for the pre-determined number and receiving the other highest-fully connected layer replica gradient portion data for the pre-determined number.
  - 13. The system of claim 3, the operations further comprising:
    - updating weights of the convolutional layer replicas and the fully-connected layer partitions using the corresponding gradient data.

14. A method for training a convolutional neural network on a plurality of batches of training examples, the convolutional neural network having a plurality of layers arranged in a sequence from lowest to highest, the sequence including one or more convolutional layers followed by one or more fully-connected layers, each convolutional layer and each fully-connected layer comprising a respective plurality of nodes, the method comprising:
- maintaining, by each of a plurality of workers, a respective replica of each of the convolutional layers, wherein each replica of a convolutional layer includes all of the nodes in the convolutional layer;
  
  maintaining, by each of the workers, a respective disjoint partition of each of the fully-connected layers, wherein each disjoint partition of a fully-connected layer includes a portion of the nodes of the fully-connected layer;
  
  receiving, by each of the workers, a batch of training examples assigned to the worker, wherein the batches of training examples are assigned such that each worker receives a respective batch of the plurality of batches;
  
  training, by each of the workers, the convolutional layer replica maintained by the worker on the batch of training examples assigned to the worker; and
  
  training, by each of the workers, the fully-connected layer partitions maintained by the worker on each of the plurality of batches of training examples.
- View Dependent Claims (15, 16, 17, 18, 19)
- - 15. The method of claim 14, wherein training the convolutional layer replica maintained by the worker on the batch of training examples assigned to the worker comprises:
    - processing the batch of training examples through all of the convolutional layer replicas maintained by the worker to compute respective convolutional activations for each training example in the batch assigned to the worker;
      
      obtaining gradient data for a highest fully-connected layer replica maintained by the worker for each of the training examples in the batch assigned to the worker; and
      
      backpropagating the highest-fully connected layer replica gradient data through the remaining convolutional layer replicas maintained by the worker.
  - 16. The method of claim 15, wherein training the fully-connected layer partitions maintained by the worker on each of the plurality of batches of training examples comprises:
    - obtaining respective convolutional data for each of the training examples in all of the plurality of batches, the convolutional data for each of the training examples comprising convolutional activations for the training example;
      
      processing the convolutional data through a partition of a lowest fully-connected layer maintained by the worker to generate own initial fully-connected layer partition activations for the corresponding training examples;
      
      sending the own initial fully-connected layer partition activations to other workers in the plurality of workers;
      
      receiving other initial fully-connected layer partition activations from other workers in the plurality of workers;
      
      for each other fully-connected layer partition maintained by the worker;
      
      processing own partition activation data and other partition activation data for a next lowest fully-connected layer partition maintained by the worker through the fully-connected layer partition to generate own partition activation data for the fully-connected layer partition,sending the own partition activation data for the fully-connected layer partition to other workers in the plurality of workers, andreceiving other partition activation data for the fully-connected layer partition from other workers in the plurality of workers;
      
      receiving own highest fully-connected layer partition gradient data for each of the training examples;
      
      sending the own highest fully-connected layer partition gradient data to other workers in the plurality of workers;
      
      receiving other highest fully-connected layer partition gradient data from other workers in the plurality of workers; and
      
      for each other fully-connected layer partition maintained by the worker;
      
      backpropagating own partition gradient data and other partition gradient data for a next highest fully-connected layer partition maintained by the worker through the fully-connected layer portion to generate own gradient partition data for the fully-connected layer partition,sending the own partition gradient data for the fully-connected layer partition to other workers in the plurality of workers, andreceiving other partition gradient data for the fully-connected layer partition from other workers in the plurality of workers.
  - 17. The method of claim 16, wherein obtaining gradient data for a highest fully-connected layer replica maintained by the worker for each of the training examples in the batch assigned to the worker comprises:
    - computing own gradient data for a portion of the highest convolutional layer replica maintained by the worker for each training example;
      
      sending own highest-fully connected layer replica gradient portion data to other workers in the plurality of workers; and
      
      receiving other highest-fully connected layer replica gradient portion from other workers in the plurality of workers.
  - 18. The method of claim 17, wherein obtaining respective convolutional data for each of the training examples in all of the plurality of batches comprises:
    - sending own convolutional data to other workers in the plurality of workers, the own convolutional data comprising the respective convolutional activations for training examples assigned to the worker; and
      
      receiving other convolutional data from other workers in the plurality of workers, the other convolutional data comprising respective convolutional activations for other training examples assigned to other workers.
  - 19. The method of claim 16, further comprising:
    - updating, by each of the workers, weights of the convolutional layer replicas maintained by the worker and the fully-connected layer partitions maintained by the worker using the corresponding gradient data.

20. One or more computer storage media encoded with instructions that, when executed by one or more computers, cause the one or more computers to perform operations for training a convolutional neural network on a plurality of batches of training examples, the convolutional neural network having a plurality of layers arranged in a sequence from lowest to highest, the sequence including one or more convolutional layers followed by one or more fully-connected layers, each convolutional layer and each fully-connected layer comprising a respective plurality of nodes, the operations comprising:
- maintaining, by each of a plurality of workers, a respective replica of each of the convolutional layers, wherein each replica of a convolutional layer includes all of the nodes in the convolutional layer;
  
  maintaining, by each of the workers, a respective disjoint partition of each of the fully-connected layers, wherein each disjoint partition of a fully-connected layer includes a portion of the nodes of the fully-connected layer;
  
  receiving, by each of the workers, a batch of training examples assigned to the worker, wherein the batches of training examples are assigned such that each worker receives a respective batch of the plurality of batches;
  
  training, by each of the workers, the convolutional layer replica maintained by the worker on the batch of training examples assigned to the worker; and
  
  training, by each of the workers, the fully-connected layer partitions maintained by the worker on each of the plurality of batches of training examples.

Specification

Resources

Litigation Campaign Assessment

Current Assignee
Google LLC (Alphabet Inc.)
Original Assignee
Google Inc. (Alphabet Inc.)
Inventors
Krizhevsky, Alexander

Granted Patent

US 10,540,587 B2
Time in Patent Office

Days
Field of Search
US Class Current

1/1
CPC Class Codes

G06N 3/045 Combinations of networks

G06N 3/084 Backpropagation, e.g. using...

PARALLELIZING THE TRAINING OF CONVOLUTIONAL NEURAL NETWORKS

First Claim

2 Assignments

0 Petitions

Accused Products

Abstract

30 Citations

20 Claims

Specification

Solutions

Use Cases

Quick Links

PARALLELIZING THE TRAINING OF CONVOLUTIONAL NEURAL NETWORKS

First Claim

2 Assignments

Subscription Required

Subscription Required

0 Petitions

Subscription Required

Accused Products

Subscription Required

Abstract

30 Citations

20 Claims

Specification

Subscription Required

Solutions

Use Cases

Quick Links