×

Rotating data for neural network computations

  • US 9,805,303 B2
  • Filed: 09/03/2015
  • Issued: 10/31/2017
  • Est. Priority Date: 05/21/2015
  • Status: Active Grant
First Claim
Patent Images

1. A method of computing a layer output for a convolutional neural network layer from a layer input to the convolutional neural network layer using a hardware matrix computation unit comprising a hardware two-dimensional systolic array, wherein the convolutional neural network layer has a plurality of kernels, each kernel comprising a kernel structure having a respective matrix structure of weights, wherein the convolutional neural network layer generates the layer output based at least in part on performing a respective convolution between each kernel and an activation input to the convolutional neural network layer, and wherein the method comprises:

  • receiving, at the hardware matrix computation unit, a plurality of activation inputs, the plurality of activation inputs represented as a multi-dimensional matrix;

    forming, by the hardware matrix computation unit, a plurality of vector inputs from the plurality of activation inputs, each vector input comprising values from a distinct region within the multi-dimensional matrix;

    sending, by the hardware matrix computation unit, the plurality of vector inputs to one or more cells along a first dimension of the hardware systolic array, comprising transmitting a respective control signal to each of one or more cells in the systolic array that causes the cell to select an activation input from the plurality of vector inputs to store in a register of the cell;

    generating, by the hardware matrix computation unit, a plurality of rotated kernel structures from each of the plurality of kernels, where generating a particular rotated kernel structure comprises shifting elements in the respective matrix structure for the kernel along one dimension;

    sending, by the hardware matrix computation unit, each kernel structure and each rotated kernel structure to one or more cells along a second dimension of the hardware systolic array;

    generating the layer output by performing respective convolutions in parallel using the kernel structures and the rotated kernel structures, comprising;

    causing the hardware systolic array to generate an accumulated output based on the plurality of vector inputs and the kernel structures and the rotated kernel structures; and

    generating, using hardware circuitry for a vector computation unit, the layer output from the accumulated output.

View all claims
  • 2 Assignments
Timeline View
Assignment View
    ×
    ×