Low latency matrix multiply unit

US 10,698,976 B2
Filed: 08/01/2019
Issued: 06/30/2020
Est. Priority Date: 05/17/2017
Status: Active Grant

First Claim

Patent Images

1. A matrix multiply unit configured to perform neural network computations of a neural network, the matrix multiply unit implemented as a systolic array of cells, the systolic array of cells arranged in a two-dimensional format, each cell of the array of cells comprising:

a weight matrix register configured to receive a weight input of the neural network from one or more weight storing registers;

the one or more weight storing registers, wherein the one or more weight storing registers are configured to receive weight inputs of the neural network to be stored in the weight matrix register from both a first direction of the two-dimensional format and a second direction of the two-dimensional format, the second direction being different from the first direction; and

a multiply unit that is coupled to the weight matrix register and configured to multiply the weight input of the weight matrix register with a vector data input of the neural network in order to obtain a multiplication result.

View all claims

2 Assignments

Timeline View

Assignment View

0 Petitions

Accused Products

Abstract

Methods, systems, and apparatus for a matrix multiply unit implemented as a systolic array of cells are disclosed. Each cell of the matrix multiply includes: a weight matrix register configured to receive a weight input from either a transposed or a non-transposed weight shift register; a transposed weight shift register configured to receive a weight input from a horizontal direction to be stored in the weight matrix register; a non-transposed weight shift register configured to receive a weight input from a vertical direction to be stored in the weight matrix register; and a multiply unit that is coupled to the weight matrix register and configured to multiply the weight input of the weight matrix register with a vector data input in order to obtain a multiplication result.

10 Citations

View as Search Results

12 Claims

1. A matrix multiply unit configured to perform neural network computations of a neural network, the matrix multiply unit implemented as a systolic array of cells, the systolic array of cells arranged in a two-dimensional format, each cell of the array of cells comprising:
- a weight matrix register configured to receive a weight input of the neural network from one or more weight storing registers;
  
  the one or more weight storing registers, wherein the one or more weight storing registers are configured to receive weight inputs of the neural network to be stored in the weight matrix register from both a first direction of the two-dimensional format and a second direction of the two-dimensional format, the second direction being different from the first direction; and
  
  a multiply unit that is coupled to the weight matrix register and configured to multiply the weight input of the weight matrix register with a vector data input of the neural network in order to obtain a multiplication result.
- View Dependent Claims (2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12)
- - 2. The matrix multiply unit of claim 1, wherein each cell further comprises:
    - a multiplexer configured to;
      
      select the weight input from the weight inputs received from the first direction and the second direction; and
      
      send the selected weight input to the weight matrix register.
  - 3. The matrix multiply unit of claim 1, further comprising a first weight holding register configured to hold the weight input from the first direction.
  - 4. The matrix multiply unit of claim 3, further comprising a second weight holding register configured to hold the weight input from the second direction.
  - 5. The matrix multiply unit of claim 4, wherein the weight input from the first direction is loaded from the one or more weight storing registers into the first weight holding register, wherein the weight input from the second direction is loaded from the one or more weight storing registers into the second weight holding register.
  - 6. The matrix multiply unit of claim 5, wherein the weight matrix register is loaded with the one of the weight input from the first direction and the weight input from the second direction.
  - 7. The matrix multiply unit of claim 6, wherein the weight input in the weight matrix register is used in any number of cycles of multiplications.
  - 8. The matrix multiply unit of claim 7, wherein during the number of cycles of multiplications, additional weight inputs are shifted into the one or more weight storing registers in preparation for a subsequent set of one or more multiplications.
  - 9. The matrix multiply unit of claim 7, wherein during the number of cycles of multiplications, another weight input of the neural network is multiplied with another vector data input of the neural network to obtain another multiplication result.
  - 10. The matrix multiply unit of claim 1, wherein the vector data input moves by one multi-cell per clock cycle.
  - 11. The matrix multiply unit of claim 1, wherein the one or more weight storing registers comprise a transposed weight shift register and a non-transposed weight shift register that is physically separate from the transposed weight shift register.
  - 12. The matrix multiply unit of claim 1, wherein the one or more weight storing registers that are configured to receive the weights comprise:
    - a first weight storing register configured to receive a first weight input over a first wired path from another cell in the array that is along the first direction; and
      
      a second weight storing register configured to receive a second weight input over a second wired path from another cell in the array that is along the second direction, wherein the weight input is at least one of the first weight input or the second weight input.

Specification

Resources

Litigation Campaign Assessment

Current Assignee
Google LLC (Alphabet Inc.)
Original Assignee
Google LLC (Alphabet Inc.)
Inventors
Phelps, Andrew Everett, Jouppi, Norman Paul
Primary Examiner(s)
Sandifer, Matthew D

Application Number

US16/529,662
Publication Number

US 20190354571A1
Time in Patent Office

334 Days
Field of Search
US Class Current
CPC Class Codes

G06F 15/8046   Systolic arrays

G06F 17/16   Matrix or vector computatio...

G06F 5/015   having at least two separat...

G06F 7/5443   Sum of products for applica...

G06F 9/30032   Movement instructions, e.g....

G06F 9/30036   Instructions to perform ope...

G06F 9/30101   Special purpose registers

G06N 3/04   Architecture, e.g. intercon...

G06N 3/063   using electronic means

G06N 3/08   Learning methods

Low latency matrix multiply unit

First Claim

2 Assignments

0 Petitions

Accused Products

Abstract

10 Citations

12 Claims

Specification

Solutions

Use Cases

Quick Links

Low latency matrix multiply unit

First Claim

2 Assignments

Subscription Required

Subscription Required

0 Petitions

Subscription Required

Accused Products

Subscription Required

Abstract

10 Citations

12 Claims

Specification

Subscription Required

Solutions

Use Cases

Quick Links