×

Rotating data for neural network computations

  • US 9,747,548 B2
  • Filed: 12/22/2016
  • Issued: 08/29/2017
  • Est. Priority Date: 05/21/2015
  • Status: Active Grant
First Claim
Patent Images

1. A method for computing a layer output for a convolutional neural network layer from a layer input for the convolutional neural network layer using a hardware matrix computation unit comprising circuitry for a two-dimensional systolic array, the convolutional neural network layer having a plurality of kernels, each kernel comprising a kernel structure having a respective matrix structure of weights, and where the convolutional layer generates the layer output based at least in part on performing a respective convolution between each kernel and an activation input to the convolutional neural network layer, the method comprising:

  • receiving, at a hardware circuit for performing neural network computations, a plurality of activation inputs, the plurality of activation inputs represented as a multi-dimensional matrix;

    forming, based on control signals from hardware circuitry for a host device, a plurality of vector inputs from the plurality of activation inputs, each vector input comprising values from a distinct region within the multi-dimensional matrix;

    sending, based on control signals from hardware circuitry for a sequencer, the plurality of vector inputs to one or more cells along a first dimension of the systolic array;

    for each of the plurality of kernels, generating a plurality of rotated kernel structures from the respective matrix structure of weights for the kernel, where the respective matrix structure of weights for the kernel is a multi-dimensional structure and generating a particular rotated kernel structure comprises shifting elements in the respective matrix structure for the kernel along at least one dimension of the respective matrix structure;

    sending each kernel structure and each rotated kernel structure to a respective distinct cell along a second dimension of the systolic array; and

    generating the layer output by performing respective convolutions in parallel using the kernel structures and the rotated kernel structures, comprising;

    causing the systolic array to generate an accumulated output based on the plurality of vector inputs and the kernel structures and the rotated kernel structures; and

    generating, using hardware circuitry for a vector computation unit, the layer output from the accumulated output.

View all claims
  • 2 Assignments
Timeline View
Assignment View
    ×
    ×