Batch processing in a neural network processor

US 9,842,293 B2
Filed: 12/22/2016
Issued: 12/12/2017
Est. Priority Date: 05/21/2015
Status: Active Grant

First Claim

Patent Images

1. A method for generating a respective neural network output for each of a plurality of inputs, wherein the generating comprises processing each input through each of a plurality of neural network layers to generate the respective neural network output for the input, wherein the neural network layers are arranged in a directed graph structure, and wherein each neural network layer has a respective batch size, the method comprising, for each of the neural network layers:

receiving a plurality of inputs to be processed at the neural network layer;

forming one or more batches of inputs from the plurality of inputs, each batch having a number of inputs equal to the respective batch size for the neural network layer, where the respective batch size is based at least on a weight reuse value, the weight reuse value representing a number of times that weight inputs need to be reused for a compute time of output values using the weight inputs at a hardware matrix computation unit of a neural network hardware circuit to be longer than a load time of the weight inputs from memory;

selecting a number of the one or more batches of inputs to process, where a count of the inputs in the number of the one or more batches is greater than, less than, or equal to the respective associated batch size of a subsequent layer in the directed graph structure; and

processing, at the neural network hardware circuit and using the hardware matrix computation unit, the number of the one or more batches of inputs to generate the respective neural network layer output.

View all claims

2 Assignments

Timeline View

Assignment View

0 Petitions

Accused Products

Abstract

Methods, systems, and apparatus, including computer programs encoded on computer storage media, for generating a respective neural network output for each of a plurality of inputs, the method comprising, for each of the neural network layers: receiving a plurality of inputs to be processed at the neural network layer; forming one or more batches of inputs from the plurality of inputs, each batch having a number of inputs up to the respective batch size for the neural network layer; selecting a number of the one or more batches of inputs to process, where a count of the inputs in the number of the one or more batches is greater than or equal to the respective associated batch size of a subsequent layer in the sequence; and processing the number of the one or more batches of inputs to generate the respective neural network layer output.

Citations

24 Claims

1. A method for generating a respective neural network output for each of a plurality of inputs, wherein the generating comprises processing each input through each of a plurality of neural network layers to generate the respective neural network output for the input, wherein the neural network layers are arranged in a directed graph structure, and wherein each neural network layer has a respective batch size, the method comprising, for each of the neural network layers:
- receiving a plurality of inputs to be processed at the neural network layer;
  
  forming one or more batches of inputs from the plurality of inputs, each batch having a number of inputs equal to the respective batch size for the neural network layer, where the respective batch size is based at least on a weight reuse value, the weight reuse value representing a number of times that weight inputs need to be reused for a compute time of output values using the weight inputs at a hardware matrix computation unit of a neural network hardware circuit to be longer than a load time of the weight inputs from memory;
  
  selecting a number of the one or more batches of inputs to process, where a count of the inputs in the number of the one or more batches is greater than, less than, or equal to the respective associated batch size of a subsequent layer in the directed graph structure; and
  
  processing, at the neural network hardware circuit and using the hardware matrix computation unit, the number of the one or more batches of inputs to generate the respective neural network layer output.
- View Dependent Claims (2, 3, 4, 5, 6, 7, 8)
- - 2. The method of claim 1, where the weight reuse value is based at least on a clock rate of the memory storing the weight inputs.
  - 3. The method of claim 1, where each batch size is based at least on the weight reuse value divided by a number of times that weight inputs for the respective layer are reused.
  - 4. The method of claim 1, where the plurality of neural network layers is processed at a matrix processing unit, where processing the number of the one or more batches of inputs comprises computing accumulated values for each input using the hardware matrix computation unit.
  - 5. The method of claim 1, where each input corresponds to a distinct image resource.
  - 6. The method of claim 1, where each input corresponds to an audio sample.
  - 7. The method of claim 1, further comprising forming a batch from the one or more layer outputs for processing at the subsequent layer.
  - 8. The method of claim 1, further comprising generating, for each output, a corresponding inference.

9. A system for generating a respective neural network output for each of a plurality of inputs, wherein the generating comprises processing each input through each of a plurality of neural network layers to generate the respective neural network output for the input, wherein the neural network layers are arranged in a directed graph structure, and wherein each neural network layer has a respective batch size, the system comprising:
- one or more computers; and
  
  a non-transitory computer-readable medium coupled to the one or more computers and having instructions stored thereon, which, when executed by the one or more computers, cause the one or more computers to, for each of the neural network layers, perform operations comprising;
  
  receiving a plurality of inputs to be processed at the neural network layer;
  
  forming one or more batches of inputs from the plurality of inputs, each batch having a number of inputs equal to the respective batch size for the neural network layer, where the respective batch size is based at least on a weight reuse value, the weight reuse value representing a number of times that weight inputs need to be reused for a compute time of output values using the weight inputs at a hardware matrix computation unit of a neural network hardware circuit to be longer than a load time of the weight inputs from memory;
  
  selecting a number of the one or more batches of inputs to process, where a count of the inputs in the number of the one or more batches is greater than, less than, or equal to the respective associated batch size of a subsequent layer in the directed graph structure; and
  
  processing, at the neural network hardware circuit and using the hardware matrix computation unit, the number of the one or more batches of inputs to generate the respective neural network layer output.
- View Dependent Claims (10, 11, 12, 13, 14, 15, 16)
- - 10. The system of claim 9, where the weight reuse value is based at least on a clock rate of the memory storing the weight inputs.
  - 11. The system of claim 9, where each batch size is based at least on the weight reuse value divided by a number of times that weight inputs for the respective layer are reused.
  - 12. The system of claim 9, where the plurality of neural network layers is processed at a matrix processing unit, where processing the number of the one or more batches of inputs comprises computing accumulated values for each input using the hardware matrix computation unit.
  - 13. The system of claim 9, where each input corresponds to a distinct image resource.
  - 14. The system of claim 9, where each input corresponds to an audio sample.
  - 15. The system of claim 9, further comprising forming a batch from the one or more layer outputs for processing at the subsequent layer.
  - 16. The system of claim 9, further comprising generating, for each output, a corresponding inference.

17. A non-transitory computer-readable medium having instructions stored thereon, which, when executed by one or more computers, cause the one or more computers to perform operations for generating a respective neural network output for each of a plurality of inputs, wherein the generating comprises processing each input through each of a plurality of neural network layers to generate the respective neural network output for the input, wherein the neural network layers are arranged in a directed graph structure, and wherein each neural network layer has a respective batch size, the operations comprising, for each of the neural network layers:
- receiving a plurality of inputs to be processed at the neural network layer;
  
  forming one or more batches of inputs from the plurality of inputs, each batch having a number of inputs equal to the respective batch size for the neural network layer, where the respective batch size is based at least on a weight reuse value, the weight reuse value representing a number of times that weight inputs need to be reused for a compute time of output values using the weight inputs at a hardware matrix computation unit of a neural network hardware circuit to be longer than a load time of the weight inputs from memory;
  
  selecting a number of the one or more batches of inputs to process, where a count of the inputs in the number of the one or more batches is greater than, less than, or equal to the respective associated batch size of a subsequent layer in the directed graph structure; and
  
  processing, at the neural network hardware circuit and using the hardware matrix computation unit, the number of the one or more batches of inputs to generate the respective neural network layer output.
- View Dependent Claims (18, 19, 20, 21, 22, 23, 24)
- - 18. The computer-readable medium of claim 17, where the weight reuse value is based at least on a clock rate of the memory storing the weight inputs.
  - 19. The computer-readable medium of claim 17, where each batch size is based at least on the weight reuse value divided by a number of times that weight inputs for the respective layer are reused.
  - 20. The computer-readable medium of claim 17, where the plurality of neural network layers is processed at a matrix processing unit, where processing the number of the one or more batches of inputs comprises computing accumulated values for each input using the hardware matrix computation unit.
  - 21. The computer-readable medium of claim 17, where each input corresponds to a distinct image resource.
  - 22. The computer-readable medium of claim 17, where each input corresponds to an audio sample.
  - 23. The computer-readable medium of claim 17, further comprising forming a batch from the one or more layer outputs for processing at the subsequent layer.
  - 24. The computer-readable medium of claim 17, further comprising generating, for each output, a corresponding inference.

Specification

Resources

Litigation Campaign Assessment

Current Assignee
Google LLC (Alphabet Inc.)
Original Assignee
Google Inc. (Alphabet Inc.)
Inventors
Young, Reginald Clifford
Primary Examiner(s)
Gonzales, Vincent

Application Number

US15/389,345
Publication Number

US 20170103317A1
Time in Patent Office

355 Days
Field of Search
US Class Current
CPC Class Codes

G06N 3/06   Physical realisation, i.e. ...

G06N 3/063   using electronic means

G06N 3/08   Learning methods

G06N 5/04   Inference or reasoning models

Batch processing in a neural network processor

First Claim

2 Assignments

0 Petitions

Accused Products

Abstract

Citations

24 Claims

Specification

Solutions

Use Cases

Quick Links

Batch processing in a neural network processor

First Claim

2 Assignments

Subscription Required

Subscription Required

0 Petitions

Subscription Required

Accused Products

Subscription Required

Abstract

Citations

24 Claims

Specification

Subscription Required

Solutions

Use Cases

Quick Links