Rotating data for neural network computations
First Claim
1. A method for computing a layer output for a convolutional neural network layer from a layer input for the convolutional neural network layer using a hardware matrix computation unit comprising circuitry for a two-dimensional systolic array, the convolutional neural network layer having a plurality of kernels, each kernel comprising a kernel structure having a respective matrix structure of weights, and where the convolutional layer generates the layer output based at least in part on performing a respective convolution between each kernel and an activation input to the convolutional neural network layer, the method comprising:
- receiving, at a hardware circuit for performing neural network computations, a plurality of activation inputs, the plurality of activation inputs represented as a multi-dimensional matrix;
forming, based on control signals from hardware circuitry for a host device, a plurality of vector inputs from the plurality of activation inputs, each vector input comprising values from a distinct region within the multi-dimensional matrix;
sending, based on control signals from hardware circuitry for a sequencer, the plurality of vector inputs to one or more cells along a first dimension of the systolic array;
for each of the plurality of kernels, generating a plurality of rotated kernel structures from the respective matrix structure of weights for the kernel, where the respective matrix structure of weights for the kernel is a multi-dimensional structure and generating a particular rotated kernel structure comprises shifting elements in the respective matrix structure for the kernel along at least one dimension of the respective matrix structure;
sending each kernel structure and each rotated kernel structure to a respective distinct cell along a second dimension of the systolic array; and
generating the layer output by performing respective convolutions in parallel using the kernel structures and the rotated kernel structures, comprising;
causing the systolic array to generate an accumulated output based on the plurality of vector inputs and the kernel structures and the rotated kernel structures; and
generating, using hardware circuitry for a vector computation unit, the layer output from the accumulated output.
2 Assignments
0 Petitions
Accused Products
Abstract
Methods, systems, and apparatus, including computer programs encoded on computer storage media, for computing a layer output for a convolutional neural network layer, the method comprising: receiving a plurality of activation inputs; forming a plurality of vector inputs from the plurality of activation inputs, each vector input comprising values from a distinct region within the multi-dimensional matrix; sending the plurality of vector inputs to one or more cells along a first dimension of the systolic array; generating a plurality of rotated kernel structures from each of the plurality of kernel; sending each kernel structure and each rotated kernel structure to one or more cells along a second dimension of the systolic array; causing the systolic array to generate an accumulated output based on the plurality of value inputs and the plurality of kernels; and generating the layer output from the accumulated output.
49 Citations
22 Claims
-
1. A method for computing a layer output for a convolutional neural network layer from a layer input for the convolutional neural network layer using a hardware matrix computation unit comprising circuitry for a two-dimensional systolic array, the convolutional neural network layer having a plurality of kernels, each kernel comprising a kernel structure having a respective matrix structure of weights, and where the convolutional layer generates the layer output based at least in part on performing a respective convolution between each kernel and an activation input to the convolutional neural network layer, the method comprising:
-
receiving, at a hardware circuit for performing neural network computations, a plurality of activation inputs, the plurality of activation inputs represented as a multi-dimensional matrix; forming, based on control signals from hardware circuitry for a host device, a plurality of vector inputs from the plurality of activation inputs, each vector input comprising values from a distinct region within the multi-dimensional matrix; sending, based on control signals from hardware circuitry for a sequencer, the plurality of vector inputs to one or more cells along a first dimension of the systolic array; for each of the plurality of kernels, generating a plurality of rotated kernel structures from the respective matrix structure of weights for the kernel, where the respective matrix structure of weights for the kernel is a multi-dimensional structure and generating a particular rotated kernel structure comprises shifting elements in the respective matrix structure for the kernel along at least one dimension of the respective matrix structure; sending each kernel structure and each rotated kernel structure to a respective distinct cell along a second dimension of the systolic array; and generating the layer output by performing respective convolutions in parallel using the kernel structures and the rotated kernel structures, comprising; causing the systolic array to generate an accumulated output based on the plurality of vector inputs and the kernel structures and the rotated kernel structures; and generating, using hardware circuitry for a vector computation unit, the layer output from the accumulated output. - View Dependent Claims (2, 3, 4, 5, 6, 7, 22)
-
-
8. A system for computing a layer output for a convolutional neural network layer from a layer input for the convolutional neural network layer using a hardware matrix computation unit comprising circuitry for a two-dimensional systolic array, the convolutional neural network layer having a plurality of kernels, each kernel comprising a kernel structure having a respective matrix structure of weights, and where the convolutional layer generates the layer output based at least in part on performing a respective convolution between each kernel and an activation input to the convolutional neural network layer, the system comprising:
-
one or more computers; and non-transitory computer-readable medium coupled to the one or more computers and having instructions stored thereon, which, when executed by the one or more computers, cause the one or more computers to perform operations comprising; receiving, at a hardware circuit for performing neural network computations, a plurality of activation inputs, the plurality of activation inputs represented as a multi-dimensional matrix; forming, based on control signals from hardware circuitry for a host device, a plurality of vector inputs from the plurality of activation inputs, each vector input comprising values from a distinct region within the multi-dimensional matrix; sending, based on control signals from hardware circuitry for a sequencer, the plurality of vector inputs to one or more cells along a first dimension of the systolic array; for each of the plurality of kernels, generating a plurality of rotated kernel structures from the respective matrix structure of weights for the kernel, where the respective matrix structure of weights for the kernel is a multi-dimensional structure and generating a particular rotated kernel structure comprises shifting elements in the respective matrix structure for the kernel along at least one dimension of the respective matrix structure; sending each kernel structure and each rotated kernel structure to a respective distinct cell along a second dimension of the systolic array; and generating the layer output by performing respective convolutions in parallel using the kernel structures and the rotated kernel structures, comprising; causing the systolic array to generate an accumulated output based on the plurality of vector inputs and the kernel structures and the rotated kernel structures; and generating, using hardware circuitry for a vector computation unit, the layer output from the accumulated output. - View Dependent Claims (9, 10, 11, 12, 13, 14)
-
-
15. A non-transitory computer-readable storage medium having instructions stored thereon, which, when executed by one or more computers, cause the one or more computers to perform operations for computing a layer output for a convolutional neural network layer from a layer input for the convolutional neural network layer using a hardware matrix computation unit comprising circuitry for a two-dimensional systolic array, the convolutional neural network layer having a plurality of kernels, each kernel comprising a kernel structure having a respective matrix structure of weights, and where the convolutional layer generates the layer output based at least in part on performing a respective convolution between each kernel and an activation input to the convolutional neural network layer, the operations comprising:
-
receiving, at a hardware circuit for performing neural network computations, a plurality of activation inputs, the plurality of activation inputs represented as a multi-dimensional matrix; forming, based on control signals from hardware circuitry for a host device, a plurality of vector inputs from the plurality of activation inputs, each vector input comprising values from a distinct region within the multi-dimensional matrix; sending, based on control signals from hardware circuitry for a sequencer, the plurality of vector inputs to one or more cells along a first dimension of the systolic array; for each of the plurality of kernels, generating a plurality of rotated kernel structures from the respective matrix structure of weights for the kernel, where the respective matrix structure of weights for the kernel is a multi-dimensional structure and generating a particular rotated kernel structure comprises shifting elements in the respective matrix structure for the kernel along at least one dimension of the respective matrix structure; sending each kernel structure and each rotated kernel structure to a respective distinct cell along a second dimension of the systolic array; generating the layer output by performing respective convolutions in parallel using the kernel structures and the rotated kernel structures, comprising; causing the systolic array to generate an accumulated output based on the plurality of vector inputs and the kernel structures and the rotated kernel structures; and generating, using hardware circuitry for a vector computation unit, the layer output from the accumulated output. - View Dependent Claims (16, 17, 18, 19, 20, 21)
-
Specification