Rotating data for neural network computations

US 9,805,303 B2
Filed: 09/03/2015
Issued: 10/31/2017
Est. Priority Date: 05/21/2015
Status: Active Grant

First Claim

Patent Images

1. A method of computing a layer output for a convolutional neural network layer from a layer input to the convolutional neural network layer using a hardware matrix computation unit comprising a hardware two-dimensional systolic array, wherein the convolutional neural network layer has a plurality of kernels, each kernel comprising a kernel structure having a respective matrix structure of weights, wherein the convolutional neural network layer generates the layer output based at least in part on performing a respective convolution between each kernel and an activation input to the convolutional neural network layer, and wherein the method comprises:

receiving, at the hardware matrix computation unit, a plurality of activation inputs, the plurality of activation inputs represented as a multi-dimensional matrix;

forming, by the hardware matrix computation unit, a plurality of vector inputs from the plurality of activation inputs, each vector input comprising values from a distinct region within the multi-dimensional matrix;

sending, by the hardware matrix computation unit, the plurality of vector inputs to one or more cells along a first dimension of the hardware systolic array, comprising transmitting a respective control signal to each of one or more cells in the systolic array that causes the cell to select an activation input from the plurality of vector inputs to store in a register of the cell;

generating, by the hardware matrix computation unit, a plurality of rotated kernel structures from each of the plurality of kernels, where generating a particular rotated kernel structure comprises shifting elements in the respective matrix structure for the kernel along one dimension;

sending, by the hardware matrix computation unit, each kernel structure and each rotated kernel structure to one or more cells along a second dimension of the hardware systolic array;

generating the layer output by performing respective convolutions in parallel using the kernel structures and the rotated kernel structures, comprising;

causing the hardware systolic array to generate an accumulated output based on the plurality of vector inputs and the kernel structures and the rotated kernel structures; and

generating, using hardware circuitry for a vector computation unit, the layer output from the accumulated output.

View all claims

2 Assignments

Timeline View

Assignment View

0 Petitions

Accused Products

Abstract

Methods, systems, and apparatus, including computer programs encoded on computer storage media, for computing a layer output for a convolutional neural network layer, the method comprising: receiving a plurality of activation inputs; forming a plurality of vector inputs from the plurality of activation inputs, each vector input comprising values from a distinct region within the multi-dimensional matrix; sending the plurality of vector inputs to one or more cells along a first dimension of the systolic array; generating a plurality of rotated kernel structures from each of the plurality of kernel; sending each kernel structure and each rotated kernel structure to one or more cells along a second dimension of the systolic array; causing the systolic array to generate an accumulated output based on the plurality of value inputs and the plurality of kernels; and generating the layer output from the accumulated output.

Citations

21 Claims

1. A method of computing a layer output for a convolutional neural network layer from a layer input to the convolutional neural network layer using a hardware matrix computation unit comprising a hardware two-dimensional systolic array, wherein the convolutional neural network layer has a plurality of kernels, each kernel comprising a kernel structure having a respective matrix structure of weights, wherein the convolutional neural network layer generates the layer output based at least in part on performing a respective convolution between each kernel and an activation input to the convolutional neural network layer, and wherein the method comprises:
- receiving, at the hardware matrix computation unit, a plurality of activation inputs, the plurality of activation inputs represented as a multi-dimensional matrix;
  
  forming, by the hardware matrix computation unit, a plurality of vector inputs from the plurality of activation inputs, each vector input comprising values from a distinct region within the multi-dimensional matrix;
  
  sending, by the hardware matrix computation unit, the plurality of vector inputs to one or more cells along a first dimension of the hardware systolic array, comprising transmitting a respective control signal to each of one or more cells in the systolic array that causes the cell to select an activation input from the plurality of vector inputs to store in a register of the cell;
  
  generating, by the hardware matrix computation unit, a plurality of rotated kernel structures from each of the plurality of kernels, where generating a particular rotated kernel structure comprises shifting elements in the respective matrix structure for the kernel along one dimension;
  
  sending, by the hardware matrix computation unit, each kernel structure and each rotated kernel structure to one or more cells along a second dimension of the hardware systolic array;
  
  generating the layer output by performing respective convolutions in parallel using the kernel structures and the rotated kernel structures, comprising;
  
  causing the hardware systolic array to generate an accumulated output based on the plurality of vector inputs and the kernel structures and the rotated kernel structures; and
  
  generating, using hardware circuitry for a vector computation unit, the layer output from the accumulated output.
- View Dependent Claims (2, 3, 4, 5, 6, 7)
- - 2. The method of claim 1, where the first dimension of the systolic array corresponds to rows of the systolic array, and where the second dimension of the systolic array corresponds to columns of the systolic array.
  - 3. The method of claim 2, where sending the plurality of vector inputs to one or more cells comprises:
    - sending, for a particular row of the systolic array, a respective element from each vector input to the particular row; and
      
      selecting, at each cell in the particular row, one of the respective elements for use in a register in the cell based on the control signal for the cell.
  - 4. The method of claim 2, where sending the plurality of vector inputs to one or more cells along a first dimension of the systolic array comprises:
    - sending each vector input to a distinct series of shift registers, each shift register shifting an element of the vector input to a subsequent shift register on a subsequent clock cycle, each shift register corresponding to a respective row in the systolic array; and
      
      selecting, for each row, an output from the corresponding shift registers for use in the row.
  - 5. The method of claim 1, where forming a plurality of vector inputs from the plurality of activation inputs is based on a size of a particular kernel structure, further comprising:
    - overlapping the particular kernel structure with the matrix representation of the plurality of activation inputs to form a first vector input from elements in the matrix representation;
      
      forming one or more other vector inputs from other elements that surround the overlapped particular kernel structure.
  - 6. The method of claim 1, where generating the layer output from the accumulated output comprises normalizing the accumulated output, pooling the accumulated output, or both, to generate the layer output.
  - 7. The method of claim 1, where sending the plurality of vector inputs to one or more cells along a first dimension of the systolic array comprises:
    - at a particular clock cycle, storing a first vector input in the plurality of vector inputs in a first cell of the systolic array; and
      
      at a subsequent clock cycle, shifting the first vector input in the first cell to a second cell that is adjacent to the first cell and storing a second vector input in the plurality of vector inputs in the first cell.

8. A system for computing a layer output for a convolutional neural network layer from a layer input for the convolutional neural network layer, the convolutional neural network layer having a plurality of kernels, each kernel comprising a kernel structure having a respective matrix structure of weights, wherein the convolutional neural network layer generates the layer output based at least in part on performing a respective convolution between each kernel and an activation input to the convolutional neural network layer, and wherein the system comprises:
- a hardware matrix computation unit comprising a hardware two-dimensional systolic array; and
  
  a storage device having instructions stored thereon, which, when executed by the hardware matrix computation unit, cause the hardware matrix computation unit to perform operations comprising;
  
  receiving, at the hardware matrix computation unit, a plurality of activation inputs, the plurality of activation inputs represented as a multi-dimensional matrix;
  
  forming, by the hardware matrix computation unit, a plurality of vector inputs from the plurality of activation inputs, each vector input comprising values from a distinct region within the multi-dimensional matrix;
  
  sending, by the hardware matrix computation unit, the plurality of vector inputs to one or more cells along a first dimension of the hardware systolic array, comprising transmitting a respective control signal to each of one or more cells in the systolic array that causes the cell to select an activation input from the plurality of vector inputs to store in a register of the cell;
  
  generating, by the hardware matrix computation unit, a plurality of rotated kernel structures from each of the plurality of kernels, where generating a particular rotated kernel structure comprises shifting elements in the respective matrix structure for the kernel along one dimension;
  
  sending, by the hardware matrix computation unit, each kernel structure and each rotated kernel structure to one or more cells along a second dimension of the hardware systolic array;
  
  generating the layer output by performing respective convolutions in parallel using the kernel structures and the rotated kernel structures, comprising;
  
  causing the hardware systolic array to generate an accumulated output based on the plurality of vector inputs and the kernel structures and the rotated kernel structures; and
  
  generating, using hardware circuitry for a vector computation unit, the layer output from the accumulated output.
- View Dependent Claims (9, 10, 11, 12, 13, 14)
- - 9. The system of claim 8, where the first dimension of the systolic array corresponds to rows of the systolic array, and where the second dimension of the systolic array corresponds to columns of the systolic array.
  - 10. The system of claim 9, where sending the plurality of vector inputs to one or more cells comprises:
    - sending, for a particular row of the systolic array, a respective element from each vector input to the particular row; and
      
      selecting, at each cell in the particular row, one of the respective elements for use in a register in the cell based on the control signal for the cell.
  - 11. The system of claim 9, where sending the plurality of vector inputs to one or more cells along a first dimension of the systolic array comprises:
    - sending each vector input to a distinct series of shift registers, each shift register shifting an element of the vector input to a subsequent shift register on a subsequent clock cycle, each shift register corresponding to a respective row in the systolic array; and
      
      selecting, for each row, an output from the corresponding shift registers for use in the row.
  - 12. The system of claim 8, where forming a plurality of vector inputs from the plurality of activation inputs is based on a size of a particular kernel structure, further comprising:
    - overlapping the particular kernel structure with the matrix representation of the plurality of activation inputs to form a first vector input from elements in the matrix representation;
      
      forming one or more other vector inputs from other elements that surround the overlapped particular kernel structure.
  - 13. The system of claim 8, where generating the layer output from the accumulated output comprises normalizing the accumulated output, pooling the accumulated output, or both, to generate the layer output.
  - 14. The system of claim 8, where sending the plurality of vector inputs to one or more cells along a first dimension of the systolic array comprises:
    - at a particular clock cycle, storing a first vector input in the plurality of vector inputs in a first cell of the systolic array; and
      
      at a subsequent clock cycle, shifting the first vector input in the first cell to a second cell that is adjacent to the first cell and storing a second vector input in the plurality of vector inputs in the first cell.

15. A computer-readable medium having instructions stored thereon, which, when executed by a hardware matrix computation unit comprising a hardware two-dimensional systolic array, cause the hardware matrix computation unit to perform operations for computing a layer output for a convolutional neural network layer from a layer input to the convolutional neural network layer using the hardware matrix computation unit, wherein the convolutional neural network layer has a plurality of kernels, each kernel comprising a kernel structure having a respective matrix structure of weights, wherein the convolutional neural network layer generates the layer output based at least in part on performing a respective convolution between each kernel and an activation input to the convolutional neural network layer, the operations comprising:
- receiving, at the hardware matrix computation unit, a plurality of activation inputs, the plurality of activation inputs represented as a multi-dimensional matrix;
  
  forming, by the hardware matrix computation unit, a plurality of vector inputs from the plurality of activation inputs, each vector input comprising values from a distinct region within the multi-dimensional matrix;
  
  sending, by the hardware matrix computation unit, the plurality of vector inputs to one or more cells along a first dimension of the systolic array, comprising transmitting a respective control signal to each of one or more cells in the systolic array that causes the cell to select an activation input from the plurality of vector inputs to store in a register of the cell;
  
  generating, by the hardware matrix computation unit, a plurality of rotated kernel structures from each of the plurality of kernels, where generating a particular rotated kernel structure comprises shifting elements in the respective matrix structure for the kernel along one dimension;
  
  sending, by the hardware matrix computation unit, each kernel structure and each rotated kernel structure to one or more cells along a second dimension of the systolic array;
  
  generating the layer output by performing respective convolutions in parallel using the kernel structures and the rotated kernel structures, comprising;
  
  causing the systolic array to generate an accumulated output based on the plurality of vector inputs and the kernel structures and the rotated kernel structures; and
  
  generating, using hardware circuitry for a vector computation unit, the layer output from the accumulated output.
- View Dependent Claims (16, 17, 18, 19, 20, 21)
- - 16. The computer-readable medium of claim 15, where the first dimension of the systolic array corresponds to rows of the systolic array, and where the second dimension of the systolic array corresponds to columns of the systolic array.
  - 17. The computer-readable medium of claim 16, where sending the plurality of vector inputs to one or more cells comprises:
    - sending, for a particular row of the systolic array, a respective element from each vector input to the particular row; and
      
      selecting, at each cell in the particular row, one of the respective elements for use in a register in the cell based on the control signal for the cell.
  - 18. The computer-readable medium of claim 16, where sending the plurality of vector inputs to one or more cells along a first dimension of the systolic array comprises:
    - sending each vector input to a distinct series of shift registers, each shift register shifting an element of the vector input to a subsequent shift register on a subsequent clock cycle, each shift register corresponding to a respective row in the systolic array; and
      
      selecting, for each row, an output from the corresponding shift registers for use in the row.
  - 19. The computer-readable medium of claim 15, where forming a plurality of vector inputs from the plurality of activation inputs is based on a size of a particular kernel structure, further comprising:
    - overlapping the particular kernel structure with the matrix representation of the plurality of activation inputs to form a first vector input from elements in the matrix representation;
      
      forming one or more other vector inputs from other elements that surround the overlapped particular kernel structure.
  - 20. The computer-readable medium of claim 15, where generating the layer output from the accumulated output comprises normalizing the accumulated output, pooling the accumulated output, or both, to generate the layer output.
  - 21. The computer-readable medium of claim 15, where sending the plurality of vector inputs to one or more cells along a first dimension of the systolic array comprises:
    - at a particular clock cycle, storing a first vector input in the plurality of vector inputs in a first cell of the systolic array; and
      
      at a subsequent clock cycle, shifting the first vector input in the first cell to a second cell that is adjacent to the first cell and storing a second vector input in the plurality of vector inputs in the first cell.

Specification

Resources

Litigation Campaign Assessment

Current Assignee
Google LLC (Alphabet Inc.)
Original Assignee
Google Inc. (Alphabet Inc.)
Inventors
Ross, Jonathan, Thorson, Gregory Michael
Primary Examiner(s)
Chaki, Kakali
Assistant Examiner(s)
Pellett, Daniel

Application Number

US14/845,022
Publication Number

US 20160342893A1
Time in Patent Office

789 Days
Field of Search

None
US Class Current
CPC Class Codes

G06F 15/8046   Systolic arrays

G06F 17/153   Multidimensional correlatio...

G06N 3/045   Combinations of networks

G06N 3/063   using electronic means

G06N 3/08   Learning methods

G06N 5/04   Inference or reasoning models

Rotating data for neural network computations

First Claim

2 Assignments

0 Petitions

Accused Products

Abstract

Citations

21 Claims

Specification

Solutions

Use Cases

Quick Links

Rotating data for neural network computations

First Claim

2 Assignments

Subscription Required

Subscription Required

0 Petitions

Subscription Required

Accused Products

Subscription Required

Abstract

Citations

21 Claims

Specification

Subscription Required

Solutions

Use Cases

Quick Links