Prefetching weights for use in a neural network processor

US 9,805,304 B2
Filed: 12/22/2016
Issued: 10/31/2017
Est. Priority Date: 05/21/2015
Status: Active Grant

First Claim

Patent Images

1. A circuit for performing neural network computations for a neural network comprising a plurality of layers, the circuit comprising:

a hardware matrix computation unit comprising circuitry for a systolic array, the systolic array comprising a plurality of cells, each cell of the plurality of cells comprising a weight register disposed within the cell for storing weight inputs received from a source external to the cell;

hardware circuitry for a weight fetcher unit configured to, for each of the plurality of neural network layers;

send, for the neural network layer, a plurality of weight inputs to cells along a first dimension of the systolic array; and

hardware circuitry for a plurality of weight sequencer units that are disposed external to each cell of the plurality of cells, each weight sequencer unit coupled to a distinct cell along the first dimension of the systolic array, each of the plurality of weight sequencer units configured to, for each of the plurality of neural network layers;

provide a control value for storage in a control register disposed within the distinct cell coupled to the weight sequencer unit, the control value being used to shift, for the neural network layer, the plurality of weight inputs to cells along the second dimension of the systolic array over a plurality of clock cycles, where each weight input is stored inside a respective cell using the weight register and along the second dimension, and where each cell is configured to compute a product of an activation input and a respective weight input using multiplication circuitry.

View all claims

2 Assignments

Timeline View

Assignment View

0 Petitions

Accused Products

Abstract

A circuit for performing neural network computations for a neural network, the circuit comprising: a systolic array comprising a plurality of cells; a weight fetcher unit configured to, for each of the plurality of neural network layers: send, for the neural network layer, a plurality of weight inputs to cells along a first dimension of the systolic array; and a plurality of weight sequencer units, each weight sequencer unit coupled to a distinct cell along the first dimension of the systolic array, the plurality of weight sequencer units configured to, for each of the plurality of neural network layers: shift, for the neural network layer, the plurality of weight inputs to cells along the second dimension of the systolic array over a plurality of clock cycles and where each cell is configured to compute a product of an activation input and a respective weight input using multiplication circuitry.

Citations

20 Claims

1. A circuit for performing neural network computations for a neural network comprising a plurality of layers, the circuit comprising:
- a hardware matrix computation unit comprising circuitry for a systolic array, the systolic array comprising a plurality of cells, each cell of the plurality of cells comprising a weight register disposed within the cell for storing weight inputs received from a source external to the cell;
  
  hardware circuitry for a weight fetcher unit configured to, for each of the plurality of neural network layers;
  
  send, for the neural network layer, a plurality of weight inputs to cells along a first dimension of the systolic array; and
  
  hardware circuitry for a plurality of weight sequencer units that are disposed external to each cell of the plurality of cells, each weight sequencer unit coupled to a distinct cell along the first dimension of the systolic array, each of the plurality of weight sequencer units configured to, for each of the plurality of neural network layers;
  
  provide a control value for storage in a control register disposed within the distinct cell coupled to the weight sequencer unit, the control value being used to shift, for the neural network layer, the plurality of weight inputs to cells along the second dimension of the systolic array over a plurality of clock cycles, where each weight input is stored inside a respective cell using the weight register and along the second dimension, and where each cell is configured to compute a product of an activation input and a respective weight input using multiplication circuitry.
- View Dependent Claims (2, 3, 4, 5, 6, 7, 8, 9, 10)
- - 2. The circuit of claim 1, further comprising:
    - a value sequencer unit configured to, for each of the plurality of neural network layers, send a plurality of activation inputs to cells along the second dimension of the systolic array for the neural network layer.
  - 3. The circuit of claim 1, where the first dimension of the systolic array corresponds to rows of the systolic array, and where the second dimension of the systolic array corresponds to columns of the systolic array.
  - 4. The circuit of claim 1, where each cell is configured to pass a weight control signal to an adjacent cell, the weight control signal causing circuitry in the adjacent cell to shift or load a weight input for the adjacent cell.
  - 5. The circuit of claim 1, where each cell comprises hardware circuitry for:
    - a weight path register disposed within the cell and coupled to the weight register, the weight path register configured to store the weight input shifted to the cell;
      
      a weight control register disposed within the cell and for storing at least the control value provided by the weight sequencer or a weight control signal passed by an adjacent cell, the weight control register being configured to determine whether to store the weight input in the weight register;
      
      an activation register disposed within the cell and configured to store an activation input and configured to send the activation input to another activation register in a first adjacent cell along the first dimension;
      
      the multiplication circuitry disposed within the cell and coupled to the weight register and the activation register, where the multiplication circuitry is configured to output a product of the weight input and the activation input;
      
      summation circuitry disposed within the cell and coupled to the multiplication circuitry and configured to receive the product and a first partial sum from a second adjacent cell along the second dimension, where the summation circuitry is configured to output a second partial sum of the product and the first partial sum; and
      
      a partial sum register disposed within the cell and coupled to the summation circuitry and configured to store the second partial sum, the partial sum register configured to send the second partial sum to another summation circuitry in a third adjacent cell along the second dimension.
  - 6. The circuit of claim 5, where each weight sequencer unit comprises:
    - a pause counter corresponding to the weight control register within the corresponding cell coupled to the weight sequencer unit; and
      
      decrement circuitry, the decrement circuitry configured to decrement an input to the weight sequencer unit to generate a decremented output and send the decremented output to the pause counter.
  - 7. The circuit of claim 6, where values in each pause counter are the same, and each weight sequencer unit is configured to load a corresponding weight input into the corresponding distinct cell of the systolic array, where the loading comprises sending the weight input to the multiplication circuitry.
  - 8. The circuit of claim 6, where values in each pause counter are different, and each weight sequencer unit is configured to shift a corresponding weight input into an adjacent weight sequencer unit along the second dimension.
  - 9. The circuit of claim 6, where values in each pause counter reaches a predetermined value to cause the plurality of weight sequencer units to pause shifting the plurality of weight inputs along the second dimension.
  - 10. The circuit of claim 1, where the systolic array is configured to, for each of the plurality of neural network layers, generate an accumulated output for the neural network layer from each product.

11. A method for performing neural network computations for a neural network comprising a plurality of layers, the method comprising, for each of the plurality of neural network layers:
- sending, at a weight fetcher unit and to a hardware matrix computation unit comprising circuitry for a systolic array, a plurality of weight inputs to cells along a first dimension of the systolic array comprising a plurality of cells;
  
  storing the plurality of weight inputs in respective weight registers disposed within each cell of the plurality of cells along the first dimension of the systolic array; and
  
  providing, using hardware circuitry for each of a plurality of weight sequencer units, a control value for storage in a control register disposed within a particular cell along the first dimension of the systolic array, wherein the control value causes the plurality of weight inputs to shift to cells along a second dimension of the systolic array over a plurality of clock cycles, where each weight sequencer unit is disposed external to each cell of the plurality of cells and coupled to a distinct cell along the first dimension of the systolic array, where each weight input is stored inside a respective cell using the weight register and along the second dimension, and where each cell is configured to compute a product of an activation input and a respective weight input using multiplication circuitry.
- View Dependent Claims (12, 13, 14, 15, 16, 17, 18, 19, 20)
- - 12. The method of claim 11, further comprising:
    - sending, at a value sequencer unit, a plurality of activation inputs to cells along the second dimension of the systolic array for the neural network layer.
  - 13. The method of claim 11, where the first dimension of the systolic array corresponds to rows of the systolic array, and where the second dimension of the systolic array corresponds to columns of the systolic array.
  - 14. The method of claim 11, further comprising passing, for each cell, a weight control signal to an adjacent cell, the weight control signal causing circuitry in the adjacent cell to shift or load a weight input for the adjacent cell.
  - 15. The method of claim 11, where each cell comprises hardware circuitry for:
    - a weight path register disposed within the cell and coupled to the weight register, the weight path register configured to store the weight input shifted to the cell;
      
      a weight control register disposed within the cell and for storing at least the control value provided by the weight sequencer or a weight control signal passed by an adjacent cell, the weight control register being configured to determine whether to store the weight input in the weight register;
      
      an activation register disposed within the cell and configured to store an activation input and configured to send the activation input to another activation register in a first adjacent cell along the first dimension;
      
      the multiplication circuitry disposed within the cell and coupled to the weight register and the activation register, where the multiplication circuitry is configured to output a product of the weight input and the activation input;
      
      summation circuitry disposed within the cell and coupled to the multiplication circuitry and configured to receive the product and a first partial sum from a second adjacent cell along the second dimension, where the summation circuitry is configured to output a second partial sum of the product and the first partial sum; and
      
      a partial sum register disposed within the cell and coupled to the summation circuitry and configured to store the second partial sum, the partial sum register configured to send the second partial sum to another summation circuitry in a third adjacent cell along the second dimension.
  - 16. The method of claim 15, further comprising:
    - decrementing, at decrement circuitry in each weight sequencer unit, a respective input to the weight sequencer unit to generate a respective decremented output;
      
      sending, for each weight sequencer unit, the respective decremented output to a respective pause counter, the respective pause counter corresponding to the weight control register within the corresponding cell coupled to the weight sequencer unit.
  - 17. The method of claim 16, where values in each pause counter are the same, and further comprising loading, at each weight sequencer unit, a corresponding weight input into the corresponding distinct cell of the systolic array, where the loading comprises sending the weight input to the multiplication circuitry.
  - 18. The method of claim 16, where values in each pause counter are different, and further comprising shifting, at each weight sequencer unit, a corresponding weight input into an adjacent weight sequencer unit along the second dimension.
  - 19. The method of claim 16, where values in each pause counter reaches a predetermined value to cause the plurality of weight sequencer units to pause shifting the plurality of weight inputs along the second dimension.
  - 20. The method of claim 11, further comprising generating, at the systolic array for each of the plurality of neural network layers, a respective accumulated output for the neural network layer from each product.

Specification

Resources

Litigation Campaign Assessment

Current Assignee
Google LLC (Alphabet Inc.)
Original Assignee
Google Inc. (Alphabet Inc.)
Inventors
Ross, Jonathan
Primary Examiner(s)
Gonzales, Vincent

Application Number

US15/389,273
Publication Number

US 20170103314A1
Time in Patent Office

313 Days
Field of Search
US Class Current
CPC Class Codes

G06F 15/8046 Systolic arrays

G06N 3/063 using electronic means

Prefetching weights for use in a neural network processor

First Claim

2 Assignments

0 Petitions

Accused Products

Abstract

Citations

20 Claims

Specification

Solutions

Use Cases

Quick Links

Prefetching weights for use in a neural network processor

First Claim

2 Assignments

Subscription Required

Subscription Required

0 Petitions

Subscription Required

Accused Products

Subscription Required

Abstract

Citations

20 Claims

Specification

Subscription Required

Solutions

Use Cases

Quick Links