Prefetching weights for use in a neural network processor
First Claim
1. A circuit for performing neural network computations for a neural network comprising a plurality of layers, the circuit comprising:
- a hardware matrix computation unit comprising circuitry for a systolic array, the systolic array comprising a plurality of cells, each cell of the plurality of cells comprising a weight register disposed within the cell for storing weight inputs received from a source external to the cell;
hardware circuitry for a weight fetcher unit configured to, for each of the plurality of neural network layers;
send, for the neural network layer, a plurality of weight inputs to cells along a first dimension of the systolic array; and
hardware circuitry for a plurality of weight sequencer units that are disposed external to each cell of the plurality of cells, each weight sequencer unit coupled to a distinct cell along the first dimension of the systolic array, each of the plurality of weight sequencer units configured to, for each of the plurality of neural network layers;
provide a control value for storage in a control register disposed within the distinct cell coupled to the weight sequencer unit, the control value being used to shift, for the neural network layer, the plurality of weight inputs to cells along the second dimension of the systolic array over a plurality of clock cycles, where each weight input is stored inside a respective cell using the weight register and along the second dimension, and where each cell is configured to compute a product of an activation input and a respective weight input using multiplication circuitry.
2 Assignments
0 Petitions
Accused Products
Abstract
A circuit for performing neural network computations for a neural network, the circuit comprising: a systolic array comprising a plurality of cells; a weight fetcher unit configured to, for each of the plurality of neural network layers: send, for the neural network layer, a plurality of weight inputs to cells along a first dimension of the systolic array; and a plurality of weight sequencer units, each weight sequencer unit coupled to a distinct cell along the first dimension of the systolic array, the plurality of weight sequencer units configured to, for each of the plurality of neural network layers: shift, for the neural network layer, the plurality of weight inputs to cells along the second dimension of the systolic array over a plurality of clock cycles and where each cell is configured to compute a product of an activation input and a respective weight input using multiplication circuitry.
-
Citations
20 Claims
-
1. A circuit for performing neural network computations for a neural network comprising a plurality of layers, the circuit comprising:
-
a hardware matrix computation unit comprising circuitry for a systolic array, the systolic array comprising a plurality of cells, each cell of the plurality of cells comprising a weight register disposed within the cell for storing weight inputs received from a source external to the cell; hardware circuitry for a weight fetcher unit configured to, for each of the plurality of neural network layers; send, for the neural network layer, a plurality of weight inputs to cells along a first dimension of the systolic array; and hardware circuitry for a plurality of weight sequencer units that are disposed external to each cell of the plurality of cells, each weight sequencer unit coupled to a distinct cell along the first dimension of the systolic array, each of the plurality of weight sequencer units configured to, for each of the plurality of neural network layers; provide a control value for storage in a control register disposed within the distinct cell coupled to the weight sequencer unit, the control value being used to shift, for the neural network layer, the plurality of weight inputs to cells along the second dimension of the systolic array over a plurality of clock cycles, where each weight input is stored inside a respective cell using the weight register and along the second dimension, and where each cell is configured to compute a product of an activation input and a respective weight input using multiplication circuitry. - View Dependent Claims (2, 3, 4, 5, 6, 7, 8, 9, 10)
-
-
11. A method for performing neural network computations for a neural network comprising a plurality of layers, the method comprising, for each of the plurality of neural network layers:
-
sending, at a weight fetcher unit and to a hardware matrix computation unit comprising circuitry for a systolic array, a plurality of weight inputs to cells along a first dimension of the systolic array comprising a plurality of cells; storing the plurality of weight inputs in respective weight registers disposed within each cell of the plurality of cells along the first dimension of the systolic array; and providing, using hardware circuitry for each of a plurality of weight sequencer units, a control value for storage in a control register disposed within a particular cell along the first dimension of the systolic array, wherein the control value causes the plurality of weight inputs to shift to cells along a second dimension of the systolic array over a plurality of clock cycles, where each weight sequencer unit is disposed external to each cell of the plurality of cells and coupled to a distinct cell along the first dimension of the systolic array, where each weight input is stored inside a respective cell using the weight register and along the second dimension, and where each cell is configured to compute a product of an activation input and a respective weight input using multiplication circuitry. - View Dependent Claims (12, 13, 14, 15, 16, 17, 18, 19, 20)
-
Specification