SYSTOLIC CONVOLUTIONAL NEURAL NETWORK

US 20190311243A1
Filed: 04/05/2018
Published: 10/10/2019
Est. Priority Date: 04/05/2018
Status: Active Grant

First Claim

Patent Images

1. A circuit for performing convolutional neural network computations for a neural network, the circuit comprising:

a transposing buffer configured to receive actuation feature vectors along a first dimension of the transposing buffer and to output feature component vectors along a second dimension of the transposing buffer;

a weight buffer configured to store kernel weight vectors along a first dimension of the weight buffer and further configured to output kernel component vectors along a second dimension of the weight buffer; and

a systolic array configured to receive the kernel weight vectors along a first dimension of the systolic array and to receive the feature component vectors along a second dimension of the systolic array,where the systolic array comprises an array of multiply and accumulate (MAC) processing cells.

View all claims

1 Assignment

Timeline View

Assignment View

0 Petitions

Accused Products

Abstract

A circuit and method are provided for performing convolutional neural network computations for a neural network. The circuit includes a transposing buffer configured to receive actuation feature vectors along a first dimension and to output feature component vectors along a second dimension, a weight buffer configured to store kernel weight vectors along a first dimension and further configured to output kernel component vectors along a second dimension, and a systolic array configured to receive the kernel weight vectors along a first dimension and to receive the feature component vectors along a second dimension. The systolic array includes an array of multiply and accumulate (MAC) processing cells. Each processing cell is associated with an output value. The actuation feature vectors may be shifted into the transposing buffer along the first dimension and output feature component vectors may shifted out of the transposing buffer along the second dimension, providing efficient dataflow.

42 Citations

19 Claims

1. A circuit for performing convolutional neural network computations for a neural network, the circuit comprising:
- a transposing buffer configured to receive actuation feature vectors along a first dimension of the transposing buffer and to output feature component vectors along a second dimension of the transposing buffer;
  
  a weight buffer configured to store kernel weight vectors along a first dimension of the weight buffer and further configured to output kernel component vectors along a second dimension of the weight buffer; and
  
  a systolic array configured to receive the kernel weight vectors along a first dimension of the systolic array and to receive the feature component vectors along a second dimension of the systolic array,where the systolic array comprises an array of multiply and accumulate (MAC) processing cells.
- View Dependent Claims (2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12)
- - 2. The circuit of claim 1, where the feature component vectors and the kernel component vectors are pipelined into the systolic array.
  - 3. The circuit of claim 1, where the feature component vectors and the kernel component vectors are broadcast into the systolic array.
  - 4. The circuit of claim 1, where the actuation feature vectors are shifted into the transposing buffer along the first dimension of the transposing buffer and output feature component vectors are shifted out of the transposing buffer along the second dimension.
  - 5. The circuit of claim 1, where systolic array is further configured to pass the kernel weight vectors to neighboring processing cells in the second dimension of the systolic array and to pass the feature component vectors to neighboring processing cells in the first dimension of the systolic array.
  - 6. The circuit of claim 1, where systolic array is further configured to output values accumulated in the processing cells, where each processing cell is associated with an output value.
  - 7. The circuit of claim 1, further comprising an output layer configured to receive accumulated values from the MAC processing cells of the systolic array and to perform at least one non-linear, pooling or normalization operation on the received accumulated values.
  - 8. The circuit of claim 1, where the values of the feature component vectors or the kernel component vectors are tagged with validity bits, indicative of data validity, and where an accumulator of a MAC processing cell is set to zero when data tagged as invalid is received.
  - 9. The circuit of claim 1, further comprising a control line coupled to the MAC processing cells, where an accumulator of a MAC processing cell is set to zero in response to a signal on the control line.
  - 10. The circuit of claim 1, further comprising an interface to a host data processing system, where the circuit is configured to receive data and commands from the host data processing system via the interface.
  - 11. A non-transitory computer readable medium containing instructions of a hardware description language that define the circuit of claim 1.
  - 12. A non-transitory computer readable medium comprising a netlist representative of the circuit of claim 1.

13. A method for performing convolution neural network computations for a neural network, the method comprising:
- loading input feature vectors into a transposing buffer along a first dimension of the transposing buffer;
  
  loading kernel weight vectors along a first dimension of a weight buffer;
  
  for each of a plurality of processing cycles;
  
  outputting kernel component vectors from a second dimension of the weight buffer to a first dimension of a systolic array, where the second dimension is perpendicular to the first dimension;
  
  outputting feature component vectors from a second dimension of the transposing buffer to a second dimension of the systolic array, where the second dimension is perpendicular to the first dimension and where the first dimension is perpendicular to the second dimension; and
  
  in each cell of the systolic array, accumulating a product of a feature component and a kernel component; and
  
  outputting accumulated products of the cells of the systolic array to an output layer of the neural network.
- View Dependent Claims (14, 15, 16, 17, 18, 19)
- - 14. The method of claim 13, further comprising, for each of the plurality of processing cycles:
    - passing the kernel weight vectors to neighboring cells in the second dimension of the systolic array; and
      
      passing the feature component vectors to neighboring cells in the first dimension of the systolic array.
  - 15. The method of claim 13, further comprising, for each of the plurality of processing cycles:
    - broadcasting the kernel weight vectors cells in the second dimension of the systolic array; and
      
      broadcasting the feature component vectors to cells in the first dimension of the systolic array.
  - 16. The method of claim 13, where loading input feature vectors into the transposing buffer along a first dimension of the transposing buffer comprises:
    - shifting data stored in the transposing buffer in the second dimension; and
      
      loading an input feature vector along an edge of the transposing buffer in the first dimension.
  - 17. The method of claim 13, where outputting feature component vectors from the second dimension of the transposing buffer to the second dimension of the systolic array comprises:
    - shifting data stored in the transposing buffer in the first dimension; and
      
      outputting a feature component vector along an edge of the transposing buffer in the second dimension.
  - 18. The method of claim 13, where a kernel weight vector is applied to a patch of pixels in an image, and wherean input feature vector comprising color components of pixels in the patch;
    - anda feature component vector comprises a color component of a corresponding pixel in each of a plurality of patches.
  - 19. The method of claim 13, where outputting accumulated sum of products of the cells of the systolic array to an output layer of the neural network comprises passing accumulated sum of products between neighboring cells of the systolic array to an edge of the systolic array.

Specification

Resources

Litigation Campaign Assessment

Current Assignee
Arm Limited (SoftBank Group Corp.)
Original Assignee
Arm Limited (SoftBank Group Corp.)
Inventors
Whatmough, Paul Nicholas, Bratt, Ian Rudolf, Mattina, Matthew

Granted Patent

US 11,188,814 B2
Time in Patent Office

Days
Field of Search
US Class Current
CPC Class Codes

G06N 3/04   Architecture, e.g. intercon...

G06N 3/045   Combinations of networks

G06N 3/063   using electronic means

SYSTOLIC CONVOLUTIONAL NEURAL NETWORK

First Claim

1 Assignment

0 Petitions

Accused Products

Abstract

42 Citations

19 Claims

Specification

Solutions

Use Cases

Quick Links

SYSTOLIC CONVOLUTIONAL NEURAL NETWORK

First Claim

1 Assignment

Subscription Required

Subscription Required

0 Petitions

Subscription Required

Accused Products

Subscription Required

Abstract

42 Citations

19 Claims

Specification

Subscription Required

Solutions

Use Cases

Quick Links