PREFETCHING WEIGHTS FOR USE IN A NEURAL NETWORK PROCESSOR

US 20160342892A1
Filed: 09/03/2015
Published: 11/24/2016
Est. Priority Date: 05/21/2015
Status: Active Grant

First Claim

Patent Images

1. A circuit for performing neural network computations for a neural network comprising a plurality of layers, the circuit comprising:

a systolic array comprising a plurality of cells;

a weight fetcher unit configured to, for each of the plurality of neural network layers;

send, for the neural network layer, a plurality of weight inputs to cells along a first dimension of the systolic array; and

a plurality of weight sequencer units, each weight sequencer unit coupled to a distinct cell along the first dimension of the systolic array, the plurality of weight sequencer units configured to, for each of the plurality of neural network layers;

shift, for the neural network layer, the plurality of weight inputs to cells along the second dimension of the systolic array over a plurality of clock cycles, where each weight input is stored inside a respective cell along the second dimension, and where each cell is configured to compute a product of an activation input and a respective weight input using multiplication circuitry.

View all claims

2 Assignments

Timeline View

Assignment View

0 Petitions

Accused Products

Abstract

A circuit for performing neural network computations for a neural network, the circuit comprising: a systolic array comprising a plurality of cells; a weight fetcher unit configured to, for each of the plurality of neural network layers: send, for the neural network layer, a plurality of weight inputs to cells along a first dimension of the systolic array; and a plurality of weight sequencer units, each weight sequencer unit coupled to a distinct cell along the first dimension of the systolic array, the plurality of weight sequencer units configured to, for each of the plurality of neural network layers: shift, for the neural network layer, the plurality of weight inputs to cells along the second dimension of the systolic array over a plurality of clock cycles and where each cell is configured to compute a product of an activation input and a respective weight input using multiplication circuitry.

53 Citations

20 Claims

1. A circuit for performing neural network computations for a neural network comprising a plurality of layers, the circuit comprising:
- a systolic array comprising a plurality of cells;
  
  a weight fetcher unit configured to, for each of the plurality of neural network layers;
  
  send, for the neural network layer, a plurality of weight inputs to cells along a first dimension of the systolic array; and
  
  a plurality of weight sequencer units, each weight sequencer unit coupled to a distinct cell along the first dimension of the systolic array, the plurality of weight sequencer units configured to, for each of the plurality of neural network layers;
  
  shift, for the neural network layer, the plurality of weight inputs to cells along the second dimension of the systolic array over a plurality of clock cycles, where each weight input is stored inside a respective cell along the second dimension, and where each cell is configured to compute a product of an activation input and a respective weight input using multiplication circuitry.
- View Dependent Claims (2, 3, 4, 5, 6, 7, 8, 9, 10)
- - 2. The circuit of claim 1, further comprising:
    - a value sequencer unit configured to, for each of the plurality of neural network layers, send a plurality of activation inputs to cells along the second dimension of the systolic array for the neural network layer.
  - 3. The circuit of claim 1, where the first dimension of the systolic array corresponds to rows of the systolic array, and where the second dimension of the systolic array corresponds to columns of the systolic array.
  - 4. The circuit of claim 1, where each cell is configured to pass a weight control signal to an adjacent cell, the weight control signal causing circuitry in the adjacent cell to shift or load a weight input for the adjacent cell.
  - 5. The circuit of claim 1, where each cell comprises:
    - a weight path register configured to store the weight input shifted to the cell;
      
      a weight register coupled to the weight path register;
      
      a weight control register configured to determine whether to store the weight input in the weight register;
      
      an activation register configured to store an activation input and configured to send the activation input to another activation register in a first adjacent cell along the first dimension;
      
      the multiplication circuitry coupled to the weight register and the activation register, where the multiplication circuitry is configured to output a product of the weight input and the activation input;
      
      summation circuitry coupled to the multiplication circuitry and configured to receive the product and a first partial sum from a second adjacent cell along the second dimension, where the summation circuitry is configured to output a second partial sum of the product and the first partial sum; and
      
      a partial sum register coupled to the summation circuitry and configured to store the second partial sum, the partial sum register configured to send the second partial sum to another summation circuitry in a third adjacent cell along the second dimension.
  - 6. The circuit of claim 5, where each weight sequencer unit comprises:
    - a pause counter corresponding to the weight control register within the corresponding cell coupled to the weight sequencer unit; and
      
      decrement circuitry, the decrement circuitry configured to decrement an input to the weight sequencer unit to generate a decremented output and send the decremented output to the pause counter.
  - 7. The circuit of claim 6, where values in each pause counter are the same, and each weight sequencer unit is configured to load a corresponding weight input into the corresponding distinct cell of the systolic array, where the loading comprises sending the weight input to the multiplication circuitry.
  - 8. The circuit of claim 6, where values in each pause counter are different, and each weight sequencer unit is configured to shift a corresponding weight input into an adjacent weight sequencer unit along the second dimension.
  - 9. The circuit of claim 6, where values in each pause counter reaches a predetermined value to cause the plurality of weight sequencer units to pause shifting the plurality of weight inputs along the second dimension.
  - 10. The circuit of claim 1, where the systolic array is configured to, for each of the plurality of neural network layers, generate an accumulated output for the neural network layer from each product.

11. A method for performing neural network computations for a neural network comprising a plurality of layers, the method comprising, for each of the plurality of neural network layers:
- sending, at a weight fetcher unit, a plurality of weight inputs to cells along a first dimension of a systolic array comprising a plurality of cells;
  
  shifting, at each of a plurality of weight sequencer units, each weight sequencer unit coupled to a distinct cell along the first dimension of the systolic array, the plurality of weight inputs to cells along the second dimension of the systolic array over a plurality of clock cycles, where each weight input is stored inside a respective cell along the second dimension, and where each cell is configured to compute a product of an activation input and a respective weight input using multiplication circuitry.
- View Dependent Claims (12, 13, 14, 15, 16, 17, 18, 19, 20)
- - 12. The method of claim 11, further comprising:
    - sending, at a value sequencer unit, a plurality of activation inputs to cells along the second dimension of the systolic array for the neural network layer.
  - 13. The method of claim 11, where the first dimension of the systolic array corresponds to rows of the systolic array, and where the second dimension of the systolic array corresponds to columns of the systolic array.
  - 14. The method of claim 11, further comprising passing, for each cell, a weight control signal to an adjacent cell to the cell, the weight control signal causing circuitry in the adjacent cell to shift or load a weight input for the adjacent cell.
  - 15. The method of claim 11, where each cell comprises:
    - a weight path register configured to store the weight input shifted to the cell;
      
      a weight register coupled to the weight path register;
      
      a weight control register configured to determine whether to store the weight input in the weight register;
      
      an activation register configured to store an activation input and configured to send the activation input to another activation register in a first adjacent cell along the first dimension;
      
      the multiplication circuitry coupled to the weight register and the activation register, where the multiplication circuitry is configured to output a product of the weight input and the activation input;
      
      summation circuitry coupled to the multiplication circuitry and configured to receive the product and a first partial sum from a second adjacent cell along the second dimension, where the summation circuitry is configured to output a second partial sum of the product and the first partial sum; and
      
      a partial sum register coupled to the summation circuitry and configured to store the second partial sum, the partial sum register configured to send the second partial sum to another summation circuitry in a third adjacent cell along the second dimension.
  - 16. The method of claim 15, further comprising:
    - decrementing, at decrement circuitry in each weight sequencer unit, a respective input to the weight sequencer unit to generate a respective decremented output;
      
      sending, for each weight sequencer unit, the respective decremented output to a respective pause counter, the respective pause counter corresponding to the weight control register within the corresponding cell coupled to the weight sequencer unit.
  - 17. The method of claim 16, where values in each pause counter are the same, and further comprising loading, at each weight sequencer unit, a corresponding weight input into the corresponding distinct cell of the systolic array, where the loading comprises sending the weight input to the multiplication circuitry.
  - 18. The method of claim 16, where values in each pause counter are different, and further comprising shifting, at each weight sequencer unit, a corresponding weight input into an adjacent weight sequencer unit along the second dimension.
  - 19. The method of claim 16, where values in each pause counter reaches a predetermined value to cause the plurality of weight sequencer units to pause shifting the plurality of weight inputs along the second dimension.
  - 20. The method of claim 11, further comprising generating, at the systolic array for each of the plurality of neural network layers, a respective accumulated output for the neural network layer from each product.

Specification

Resources

Litigation Campaign Assessment

Current Assignee
Google LLC (Alphabet Inc.)
Original Assignee
Google Inc. (Alphabet Inc.)
Inventors
Ross, Jonathan

Granted Patent

US 10,049,322 B2
Time in Patent Office

Days
Field of Search
US Class Current

1/1
CPC Class Codes

G06F 15/8046 Systolic arrays

G06N 3/063 using electronic means

PREFETCHING WEIGHTS FOR USE IN A NEURAL NETWORK PROCESSOR

First Claim

2 Assignments

0 Petitions

Accused Products

Abstract

53 Citations

20 Claims

Specification

Use Cases

Quick Links

Others

PREFETCHING WEIGHTS FOR USE IN A NEURAL NETWORK PROCESSOR

First Claim

2 Assignments

Subscription Required

Subscription Required

0 Petitions

Subscription Required

Accused Products

Subscription Required

Abstract

53 Citations

20 Claims

Specification

Subscription Required

Use Cases

Quick Links

Others