HETEROGENEOUS HARDWARE ACCELERATOR ARCHITECTURE FOR PROCESSING SPARSE MATRIX DATA WITH SKEWED NON-ZERO DISTRIBUTIONS

US 20180189239A1
Filed: 12/31/2016
Published: 07/05/2018
Est. Priority Date: 12/31/2016
Status: Active Grant

First Claim

Patent Images

1. A method in a hardware processor for processing sparse matrix data having a skewed non-zero distribution comprising:

determining, by the hardware processor, that one or more computational tasks involving a matrix are to be performed;

partitioning, by the hardware processor, the matrix into a first plurality of blocks and a second plurality of blocks, wherein the first plurality of blocks includes one or more sections of the matrix that are sparse, and wherein the second plurality of blocks includes another one or more sections of the matrix that are very- or hyper-sparse; and

causing, by the hardware processor, one or more sparse tiles of the hardware processor to perform one or more matrix operations for the one or more computational tasks using the first plurality of blocks and further causing one or more very/hyper sparse tiles of the hardware processor to perform the one or more matrix operations for the one or more computational tasks using the second plurality of blocks.

View all claims

1 Assignment

Timeline View

Assignment View

0 Petitions

Accused Products

Abstract

Heterogeneous hardware accelerator architectures for processing sparse matrix data having skewed non-zero distributions are described. An accelerator includes sparse tiles to access data from a first memory over a high bandwidth interface and very/hyper sparse tiles to randomly access data from a second memory over a low-latency interface. The accelerator determines that one or more computational tasks involving a matrix are to be performed, partitions the matrix into a first plurality of blocks that includes one or more sparse sections of the matrix, and a second plurality of blocks that includes sections of the matrix that are very- or hyper-sparse. The accelerator causes the sparse tile(s) to perform one or more matrix operations for the computational task(s) using the first plurality of blocks and further causes the very/hyper sparse tile(s) to perform the one or more matrix operations for the computational task(s) using the second plurality of blocks.

Citations

20 Claims

1. A method in a hardware processor for processing sparse matrix data having a skewed non-zero distribution comprising:
- determining, by the hardware processor, that one or more computational tasks involving a matrix are to be performed;
  
  partitioning, by the hardware processor, the matrix into a first plurality of blocks and a second plurality of blocks, wherein the first plurality of blocks includes one or more sections of the matrix that are sparse, and wherein the second plurality of blocks includes another one or more sections of the matrix that are very- or hyper-sparse; and
  
  causing, by the hardware processor, one or more sparse tiles of the hardware processor to perform one or more matrix operations for the one or more computational tasks using the first plurality of blocks and further causing one or more very/hyper sparse tiles of the hardware processor to perform the one or more matrix operations for the one or more computational tasks using the second plurality of blocks.
- View Dependent Claims (2, 3, 4, 5, 6, 7)
- - 2. The method of claim 1, further comprising determining, by the hardware processor, whether the matrix is sparse and has a skewed non-zero distribution, wherein said partitioning occurs responsive to determining that the matrix is sparse and does have the skewed non-zero distribution.
  - 3. The method of claim 1, wherein said partitioning comprises:
    - determining a number of rows or columns of the matrix having only zero values.
  - 4. The method of claim 3, wherein said partitioning comprises:
    - determining whether the number satisfies a threshold criteria.
  - 5. The method of claim 1, further comprising converting a format of each of the second plurality of blocks into a doubly-compressed format.
  - 6. The method of claim 1, wherein said causing the one or more sparse tiles to perform the one or more matrix operations comprises storing the first plurality of blocks in a first memory unit, the first memory unit to stream data of the first plurality of blocks to the one or more sparse tiles over a high-bandwidth interface.
  - 7. The method of claim 6, wherein said causing the one or more very/hyper sparse tiles to perform the one or more matrix operations comprises storing the second plurality of blocks in a second memory unit, the second memory unit to provide data of the second plurality of blocks to the one or more very/hyper sparse tiles responsive to random access requests from the one or more very/hyper sparse tiles over a low-latency interface.

8. A hardware processor comprising:
- one or more sparse tiles comprising a first plurality of processing units to access data from a first memory unit over a high bandwidth interface;
  
  one or more very/hyper sparse tiles comprising a second plurality of processing units to randomly access data from a second memory unit over a low-latency interface; and
  
  a control unit to;
  
  determine that one or more computational tasks involving a matrix are to be performed;
  
  partition the matrix into a first plurality of blocks and a second plurality of blocks, wherein the first plurality of blocks includes one or more sections of the matrix that are sparse, and wherein the second plurality of blocks includes another one or more sections of the matrix that are very- or hyper-sparse; and
  
  cause the one or more sparse tiles to perform one or more matrix operations for the one or more computational tasks using the first plurality of blocks and further cause the one or more very/hyper sparse tiles to perform the one or more matrix operations for the one or more computational tasks using the second plurality of blocks.
- View Dependent Claims (9, 10, 11, 12, 13, 14)
- - 9. The hardware processor of claim 8, wherein the control unit is further to determine whether the matrix is sparse and has a skewed non-zero distribution, wherein the control unit is to perform the partition responsive to a determination that the matrix is sparse and does have the skewed non-zero distribution.
  - 10. The hardware processor of claim 8, wherein, to partition the matrix into the first plurality of blocks and the second plurality of blocks, the control unit is to:
    - determine a number of rows or columns of the matrix having only zero values.
  - 11. The hardware processor of claim 10, wherein, to partition the matrix into the first plurality of blocks and the second plurality of blocks, the control unit is further to:
    - determine whether the number satisfies a threshold criteria.
  - 12. The hardware processor of claim 8, wherein the control unit is further to convert a format of each of the second plurality of blocks into a doubly-compressed format.
  - 13. The hardware processor claim 8, wherein, to cause the one or more sparse tiles to perform the one or more matrix operations, the control unit is to cause the first plurality of blocks to be stored in a first memory unit, the first memory unit to stream data of the first plurality of blocks to the one or more sparse tiles over a high-bandwidth interface.
  - 14. The hardware processor of claim 13, wherein, to cause the one or more very/hyper sparse tiles to perform the one or more matrix operations, the control unit is to store the second plurality of blocks in a second memory unit, the second memory unit to provide data of the second plurality of blocks to the one or more very/hyper sparse tiles responsive to random access requests from the one or more very/hyper sparse tiles over a low-latency interface.

15. A system comprising:
- a first memory unit;
  
  a second memory unit;
  
  one or more sparse tiles comprising a first plurality of processing units to access data from the first memory unit over a high bandwidth interface;
  
  one or more very/hyper sparse tiles comprising a second plurality of processing units to randomly access data from the second memory unit over a low-latency interface; and
  
  a control unit to;
  
  determine that one or more computational tasks involving a matrix are to be performed;
  
  partition the matrix into a first plurality of blocks and a second plurality of blocks, wherein the first plurality of blocks includes one or more sections of the matrix that are sparse, and wherein the second plurality of blocks includes another one or more sections of the matrix that are very- or hyper-sparse; and
  
  cause the one or more sparse tiles to perform one or more matrix operations for the one or more computational tasks using the first plurality of blocks and further cause the one or more very/hyper sparse tiles to perform the one or more matrix operations for the one or more computational tasks using the second plurality of blocks.
- View Dependent Claims (16, 17, 18, 19, 20)
- - 16. The system of claim 15, wherein the control unit is further to determine whether the matrix is sparse and has a skewed non-zero distribution, wherein the control unit is to perform the partition responsive to a determination that the matrix is sparse and does have the skewed non-zero distribution.
  - 17. The system of claim 15, wherein, to partition the matrix into the first plurality of blocks and the second plurality of blocks, the control unit is to:
    - determine a number of rows or columns of the matrix having only zero values.
  - 18. The system of claim 17, wherein, to partition the matrix into the first plurality of blocks and the second plurality of blocks, the control unit is further to:
    - determine whether the number satisfies a threshold criteria.
  - 19. The system of claim 15, wherein the control unit is further to convert a format of each of the second plurality of blocks into a doubly-compressed format.
  - 20. The system of claim 15, wherein, to cause the one or more sparse tiles to perform the one or more matrix operations, the control unit is to cause the first plurality of blocks to be stored in a first memory unit, the first memory unit to stream data of the first plurality of blocks to the one or more sparse tiles over a high-bandwidth interface.

Specification

Resources

Litigation Campaign Assessment

Current Assignee
Intel Corporation
Original Assignee
Intel Corporation
Inventors

Granted Patent

US 10,180,928 B2
Time in Patent Office

Days
Field of Search
US Class Current
CPC Class Codes

G06F 17/16   Matrix or vector computatio...

G06F 9/3001   Arithmetic instructions

G06F 9/30036   Instructions to perform ope...

G06F 9/30038   using a mask

H03M 7/30   Compression speech analysis...

HETEROGENEOUS HARDWARE ACCELERATOR ARCHITECTURE FOR PROCESSING SPARSE MATRIX DATA WITH SKEWED NON-ZERO DISTRIBUTIONS

First Claim

1 Assignment

0 Petitions

Accused Products

Abstract

Citations

20 Claims

Specification

Solutions

Use Cases

Quick Links

HETEROGENEOUS HARDWARE ACCELERATOR ARCHITECTURE FOR PROCESSING SPARSE MATRIX DATA WITH SKEWED NON-ZERO DISTRIBUTIONS

First Claim

1 Assignment

Subscription Required

Subscription Required

0 Petitions

Subscription Required

Accused Products

Subscription Required

Abstract

Citations

20 Claims

Specification

Subscription Required

Solutions

Use Cases

Quick Links