SYSTOLIC CONVOLUTIONAL NEURAL NETWORK
First Claim
1. A circuit for performing convolutional neural network computations for a neural network, the circuit comprising:
- a transposing buffer configured to receive actuation feature vectors along a first dimension of the transposing buffer and to output feature component vectors along a second dimension of the transposing buffer;
a weight buffer configured to store kernel weight vectors along a first dimension of the weight buffer and further configured to output kernel component vectors along a second dimension of the weight buffer; and
a systolic array configured to receive the kernel weight vectors along a first dimension of the systolic array and to receive the feature component vectors along a second dimension of the systolic array,where the systolic array comprises an array of multiply and accumulate (MAC) processing cells.
1 Assignment
0 Petitions
Accused Products
Abstract
A circuit and method are provided for performing convolutional neural network computations for a neural network. The circuit includes a transposing buffer configured to receive actuation feature vectors along a first dimension and to output feature component vectors along a second dimension, a weight buffer configured to store kernel weight vectors along a first dimension and further configured to output kernel component vectors along a second dimension, and a systolic array configured to receive the kernel weight vectors along a first dimension and to receive the feature component vectors along a second dimension. The systolic array includes an array of multiply and accumulate (MAC) processing cells. Each processing cell is associated with an output value. The actuation feature vectors may be shifted into the transposing buffer along the first dimension and output feature component vectors may shifted out of the transposing buffer along the second dimension, providing efficient dataflow.
42 Citations
19 Claims
-
1. A circuit for performing convolutional neural network computations for a neural network, the circuit comprising:
-
a transposing buffer configured to receive actuation feature vectors along a first dimension of the transposing buffer and to output feature component vectors along a second dimension of the transposing buffer; a weight buffer configured to store kernel weight vectors along a first dimension of the weight buffer and further configured to output kernel component vectors along a second dimension of the weight buffer; and a systolic array configured to receive the kernel weight vectors along a first dimension of the systolic array and to receive the feature component vectors along a second dimension of the systolic array, where the systolic array comprises an array of multiply and accumulate (MAC) processing cells. - View Dependent Claims (2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12)
-
-
13. A method for performing convolution neural network computations for a neural network, the method comprising:
-
loading input feature vectors into a transposing buffer along a first dimension of the transposing buffer; loading kernel weight vectors along a first dimension of a weight buffer; for each of a plurality of processing cycles; outputting kernel component vectors from a second dimension of the weight buffer to a first dimension of a systolic array, where the second dimension is perpendicular to the first dimension; outputting feature component vectors from a second dimension of the transposing buffer to a second dimension of the systolic array, where the second dimension is perpendicular to the first dimension and where the first dimension is perpendicular to the second dimension; and in each cell of the systolic array, accumulating a product of a feature component and a kernel component; and outputting accumulated products of the cells of the systolic array to an output layer of the neural network. - View Dependent Claims (14, 15, 16, 17, 18, 19)
-
Specification