Low latency matrix multiply unit

US 10,698,974 B2
Filed: 05/17/2018
Issued: 06/30/2020
Est. Priority Date: 05/17/2017
Status: Active Grant

First Claim

Patent Images

1. A matrix multiply unit configured to perform neural network computations of a neural network, the matrix multiply unit implemented as a systolic array of cells, the systolic array of cells arranged in a two-dimensional format, each cell of the array of cells comprising:

a weight matrix register configured to receive one of a first weight input of the neural network from a transposed weight shift register and a second weight input of the neural network from a non-transposed weight shift register;

the transposed weight shift register configured to receive the first weight input from a first direction of the two-dimensional format to be stored in the weight matrix register;

the non-transposed weight shift register configured to receive the second weight input from a second direction of the two-dimensional format to be stored in the weight matrix register, the second direction being perpendicular to the first direction; and

a multiply unit that is coupled to the weight matrix register and configured to multiply the received weight input with a vector data input of the neural network to obtain a multiplication result.

View all claims

2 Assignments

Timeline View

Assignment View

0 Petitions

Accused Products

Abstract

Methods, systems, and apparatus for a matrix multiply unit implemented as a systolic array of cells are disclosed. Each cell of the matrix multiply includes: a weight matrix register configured to receive a weight input from either a transposed or a non-transposed weight shift register; a transposed weight shift register configured to receive a weight input from a horizontal direction to be stored in the weight matrix register; a non-transposed weight shift register configured to receive a weight input from a vertical direction to be stored in the weight matrix register; and a multiply unit that is coupled to the weight matrix register and configured to multiply the weight input of the weight matrix register with a vector data input in order to obtain a multiplication result.

Citations

13 Claims

1. A matrix multiply unit configured to perform neural network computations of a neural network, the matrix multiply unit implemented as a systolic array of cells, the systolic array of cells arranged in a two-dimensional format, each cell of the array of cells comprising:
- a weight matrix register configured to receive one of a first weight input of the neural network from a transposed weight shift register and a second weight input of the neural network from a non-transposed weight shift register;
  
  the transposed weight shift register configured to receive the first weight input from a first direction of the two-dimensional format to be stored in the weight matrix register;
  
  the non-transposed weight shift register configured to receive the second weight input from a second direction of the two-dimensional format to be stored in the weight matrix register, the second direction being perpendicular to the first direction; and
  
  a multiply unit that is coupled to the weight matrix register and configured to multiply the received weight input with a vector data input of the neural network to obtain a multiplication result.
- View Dependent Claims (2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13)
- - 2. The matrix multiply unit of claim 1, wherein each cell further comprises:
    - a multiplexer configured to select between the first weight input and the second weight input, and forward the selected weight input to the weight matrix register.
  - 3. The matrix multiply unit of claim 1, further comprising a first weight holding register configured to hold one of the first weight input and the second weight input.
  - 4. The matrix multiply unit of claim 3, further comprising a second weight holding register configured to hold the other one of the first weight input and the second weight input.
  - 5. The matrix multiply unit of claim 4, wherein the first weight input is loaded from the transposed weight shift register into the first weight holding register and the second weight input is loaded from the second direction into the second weight holding register.
  - 6. The matrix multiply unit of claim 5, wherein the weight matrix register is loaded with one of the first weight input and the second weight input from one of the first weight holding register and the second weight holding register.
  - 7. The matrix multiply unit of claim 6, wherein data in the weight matrix register is used in any number of cycles of multiplications.
  - 8. The matrix multiply unit of claim 7, wherein during the number of cycles of multiplications, additional weight inputs are shifted into the weight shift registers in preparation for a next set of multiplications.
  - 9. The matrix multiply unit of claim 7, wherein during the number of cycles of multiplications, another weight input stored in the weight matrix register is multiplied with another vector data input in order to obtain another multiplication result.
  - 10. The matrix multiply unit of claim 1, wherein the vector data input moves by one multi-cell per clock cycle.
  - 11. The matrix multiply unit of claim 1, wherein the first weight input and the second weight input are shifted based on instructions when the instructions are received.
  - 12. The matrix multiply unit of claim 1, wherein the transposed weight shift register is physically separate from the non-transposed weight shift register.
  - 13. The matrix multiply unit of claim 1, wherein:
    - the first weight input is received on a first wired path from another cell in the array that is along the first direction; and
      
      the second weight input is received on a second wired path from another cell in the array that is along the second direction.

Specification

Resources

Litigation Campaign Assessment

Current Assignee
Google LLC (Alphabet Inc.)
Original Assignee
Google LLC (Alphabet Inc.)
Inventors
Phelps, Andrew Everett, Jouppi, Norman Paul
Primary Examiner(s)
Sandifer, Matthew D

Application Number

US15/983,037
Publication Number

US 20180336163A1
Time in Patent Office

775 Days
Field of Search
US Class Current
CPC Class Codes

G06F 15/8046   Systolic arrays

G06F 17/16   Matrix or vector computatio...

G06F 5/015   having at least two separat...

G06F 7/5443   Sum of products for applica...

G06F 9/30032   Movement instructions, e.g....

G06F 9/30036   Instructions to perform ope...

G06F 9/30101   Special purpose registers

G06N 3/04   Architecture, e.g. intercon...

G06N 3/063   using electronic means

G06N 3/08   Learning methods

Low latency matrix multiply unit

First Claim

2 Assignments

0 Petitions

Accused Products

Abstract

Citations

13 Claims

Specification

Solutions

Use Cases

Quick Links

Low latency matrix multiply unit

First Claim

2 Assignments

Subscription Required

Subscription Required

0 Petitions

Subscription Required

Accused Products

Subscription Required

Abstract

Citations

13 Claims

Specification

Subscription Required

Solutions

Use Cases

Quick Links