Batch processing in a neural network processor
First Claim
1. A method for performing neural network computations using hardware circuitry comprising a hardware matrix computation unit, the neural network computations being for a neural network having a plurality of neural network layers, the method comprising:
- obtaining, using the hardware circuitry, a plurality of layer inputs to be processed;
based on (i) a size of a layer input to a particular neural network layer of the plurality of neural network layers and (ii) a weight reuse value representing a number of times that the hardware matrix computation unit of the hardware circuitry reuses weight inputs for neural network computations, determining, using the hardware circuitry, a batch size for the particular neural network layer, wherein the batch size represents a number of batches to be processed in parallel by the hardware matrix computation unit for the particular neural network layer; and
processing, by the hardware matrix computation unit and for the particular neural network layer, one or more batches of layer inputs to generate one or more layer outputs, wherein each batch of the one or more batches includes a number of layer inputs corresponding to the batch size for particular neural network layer.
2 Assignments
0 Petitions
Accused Products
Abstract
Methods, systems, and apparatus, including computer programs encoded on computer storage media, for generating a respective neural network output for each of a plurality of inputs, the method comprising, for each of the neural network layers: receiving a plurality of inputs to be processed at the neural network layer; forming one or more batches of inputs from the plurality of inputs, each batch having a number of inputs up to the respective batch size for the neural network layer; selecting a number of the one or more batches of inputs to process, where a count of the inputs in the number of the one or more batches is greater than or equal to the respective associated batch size of a subsequent layer in the sequence; and processing the number of the one or more batches of inputs to generate the respective neural network layer output.
34 Citations
27 Claims
-
1. A method for performing neural network computations using hardware circuitry comprising a hardware matrix computation unit, the neural network computations being for a neural network having a plurality of neural network layers, the method comprising:
-
obtaining, using the hardware circuitry, a plurality of layer inputs to be processed; based on (i) a size of a layer input to a particular neural network layer of the plurality of neural network layers and (ii) a weight reuse value representing a number of times that the hardware matrix computation unit of the hardware circuitry reuses weight inputs for neural network computations, determining, using the hardware circuitry, a batch size for the particular neural network layer, wherein the batch size represents a number of batches to be processed in parallel by the hardware matrix computation unit for the particular neural network layer; and processing, by the hardware matrix computation unit and for the particular neural network layer, one or more batches of layer inputs to generate one or more layer outputs, wherein each batch of the one or more batches includes a number of layer inputs corresponding to the batch size for particular neural network layer. - View Dependent Claims (2, 3, 4, 5, 6, 7, 8, 9, 10)
-
-
11. A system for performing neural network computations using hardware circuitry comprising a hardware matrix computation unit, the neural network computations being for a neural network having a plurality of neural network layers, the system comprising:
-
one or more processors; and a non-transitory computer-readable medium coupled to the one or more processors and having instructions stored thereon, which, when executed by the one or more processors, cause performance of operations comprising; obtaining, using the hardware circuitry, a plurality of layer inputs to be processed; based on (i) a size of a layer input to a particular neural network layer of the plurality of neural network layers and (ii) a weight reuse value representing a number of times that the hardware matrix computation unit of the hardware circuitry reuses weight inputs for neural network computations, determining, using the hardware circuitry, a batch size for the particular neural network layer, wherein the batch size represents a number of batches to be processed in parallel by the hardware matrix computation unit for the particular neural network layer; and processing, by the hardware matrix computation unit and for the particular neural network layer, one or more batches of layer inputs to generate one or more layer outputs, wherein each batch of the one or more batches includes a number of layer inputs corresponding to the batch size for particular neural network layer. - View Dependent Claims (12, 13, 14, 15, 16, 17, 18)
-
-
19. A non-transitory computer-readable medium having instructions stored thereon, which, when executed by one or more processors, cause performance of operations comprising:
-
obtaining, for a neural network having a plurality of neural network layers, a plurality of layer inputs to be processed, wherein the plurality of layer inputs are obtained using hardware circuitry comprising a hardware matrix computation unit; based on (i) a size of a layer input to a particular neural network layer of the plurality of neural network layers and (ii) a weight reuse value representing a number of times that the hardware matrix computation unit of the hardware circuitry reuses weight inputs for neural network computations, determining, using the hardware circuitry, a batch size for the particular neural network layer, wherein the batch size represents a number of batches to be processed in parallel by the hardware matrix computation unit for the particular neural network layer; and processing, by the hardware matrix computation unit and for the particular neural network layer, one or more batches of layer inputs to generate one or more layer outputs, wherein each batch of the one or more batches includes a number of layer inputs corresponding to the batch size for particular neural network layer. - View Dependent Claims (20, 21, 22, 23, 24, 25, 26, 27)
-
Specification