HETEROGENEOUS HARDWARE ACCELERATOR ARCHITECTURE FOR PROCESSING SPARSE MATRIX DATA WITH SKEWED NON-ZERO DISTRIBUTIONS
First Claim
1. A method in a hardware processor for processing sparse matrix data having a skewed non-zero distribution comprising:
- determining, by the hardware processor, that one or more computational tasks involving a matrix are to be performed;
partitioning, by the hardware processor, the matrix into a first plurality of blocks and a second plurality of blocks, wherein the first plurality of blocks includes one or more sections of the matrix that are sparse, and wherein the second plurality of blocks includes another one or more sections of the matrix that are very- or hyper-sparse; and
causing, by the hardware processor, one or more sparse tiles of the hardware processor to perform one or more matrix operations for the one or more computational tasks using the first plurality of blocks and further causing one or more very/hyper sparse tiles of the hardware processor to perform the one or more matrix operations for the one or more computational tasks using the second plurality of blocks.
1 Assignment
0 Petitions
Accused Products
Abstract
Heterogeneous hardware accelerator architectures for processing sparse matrix data having skewed non-zero distributions are described. An accelerator includes sparse tiles to access data from a first memory over a high bandwidth interface and very/hyper sparse tiles to randomly access data from a second memory over a low-latency interface. The accelerator determines that one or more computational tasks involving a matrix are to be performed, partitions the matrix into a first plurality of blocks that includes one or more sparse sections of the matrix, and a second plurality of blocks that includes sections of the matrix that are very- or hyper-sparse. The accelerator causes the sparse tile(s) to perform one or more matrix operations for the computational task(s) using the first plurality of blocks and further causes the very/hyper sparse tile(s) to perform the one or more matrix operations for the computational task(s) using the second plurality of blocks.
-
Citations
20 Claims
-
1. A method in a hardware processor for processing sparse matrix data having a skewed non-zero distribution comprising:
-
determining, by the hardware processor, that one or more computational tasks involving a matrix are to be performed; partitioning, by the hardware processor, the matrix into a first plurality of blocks and a second plurality of blocks, wherein the first plurality of blocks includes one or more sections of the matrix that are sparse, and wherein the second plurality of blocks includes another one or more sections of the matrix that are very- or hyper-sparse; and causing, by the hardware processor, one or more sparse tiles of the hardware processor to perform one or more matrix operations for the one or more computational tasks using the first plurality of blocks and further causing one or more very/hyper sparse tiles of the hardware processor to perform the one or more matrix operations for the one or more computational tasks using the second plurality of blocks. - View Dependent Claims (2, 3, 4, 5, 6, 7)
-
-
8. A hardware processor comprising:
-
one or more sparse tiles comprising a first plurality of processing units to access data from a first memory unit over a high bandwidth interface; one or more very/hyper sparse tiles comprising a second plurality of processing units to randomly access data from a second memory unit over a low-latency interface; and a control unit to; determine that one or more computational tasks involving a matrix are to be performed; partition the matrix into a first plurality of blocks and a second plurality of blocks, wherein the first plurality of blocks includes one or more sections of the matrix that are sparse, and wherein the second plurality of blocks includes another one or more sections of the matrix that are very- or hyper-sparse; and cause the one or more sparse tiles to perform one or more matrix operations for the one or more computational tasks using the first plurality of blocks and further cause the one or more very/hyper sparse tiles to perform the one or more matrix operations for the one or more computational tasks using the second plurality of blocks. - View Dependent Claims (9, 10, 11, 12, 13, 14)
-
-
15. A system comprising:
-
a first memory unit; a second memory unit; one or more sparse tiles comprising a first plurality of processing units to access data from the first memory unit over a high bandwidth interface; one or more very/hyper sparse tiles comprising a second plurality of processing units to randomly access data from the second memory unit over a low-latency interface; and a control unit to; determine that one or more computational tasks involving a matrix are to be performed; partition the matrix into a first plurality of blocks and a second plurality of blocks, wherein the first plurality of blocks includes one or more sections of the matrix that are sparse, and wherein the second plurality of blocks includes another one or more sections of the matrix that are very- or hyper-sparse; and cause the one or more sparse tiles to perform one or more matrix operations for the one or more computational tasks using the first plurality of blocks and further cause the one or more very/hyper sparse tiles to perform the one or more matrix operations for the one or more computational tasks using the second plurality of blocks. - View Dependent Claims (16, 17, 18, 19, 20)
-
Specification