Matrix Ordering for Cache Efficiency in Performing Large Sparse Matrix Operations
First Claim
1. A method, in a data processing system comprising a processor and a memory, for performing a matrix operation, the method comprising:
- configuring the processor of the data processing system to perform cluster-based matrix reordering of an input matrix;
receiving, by the processor, the input matrix, wherein the input matrix comprises nodes associated with elements of the matrix;
clustering, by the processor, the nodes into clusters based on numbers of connections with other nodes within and between the clusters;
ordering, by the processor, the clusters by minimizing a total length of cross cluster connections between nodes of the clusters, to thereby generate a reordered matrix;
generating, by the processor, a lookup table identifying new locations of nodes of the input matrix, in the reordered matrix;
storing, in a memory of the data processing system, data corresponding to the nodes in accordance with the new locations of nodes in the reordered matrix; and
performing, by the processor, a matrix operation based on the reordered matrix and the lookup table at least by loading data corresponding to nodes in the reordered matrix into a cache memory of the data processing system, wherein the storage of the data corresponding to the nodes in accordance with the new locations of nodes in the reordered matrix minimizes cache misses in the cache memory when performing the matrix operation.
1 Assignment
0 Petitions
Accused Products
Abstract
Mechanisms are provided for performing a matrix operation. A processor of a data processing system is configured to perform cluster-based matrix reordering of an input matrix. An input matrix, which comprises nodes associated with elements of the matrix, is received. The nodes are clustered into clusters based on numbers of connections with other nodes within and between the clusters, and the clusters are ordered by minimizing a total length of cross cluster connections between nodes of the clusters, to thereby generate a reordered matrix. A lookup table is generated identifying new locations of nodes of the input matrix, in the reordered matrix. A matrix operation is then performed based on the reordered matrix and the lookup table.
-
Citations
21 Claims
-
1. A method, in a data processing system comprising a processor and a memory, for performing a matrix operation, the method comprising:
-
configuring the processor of the data processing system to perform cluster-based matrix reordering of an input matrix; receiving, by the processor, the input matrix, wherein the input matrix comprises nodes associated with elements of the matrix; clustering, by the processor, the nodes into clusters based on numbers of connections with other nodes within and between the clusters; ordering, by the processor, the clusters by minimizing a total length of cross cluster connections between nodes of the clusters, to thereby generate a reordered matrix; generating, by the processor, a lookup table identifying new locations of nodes of the input matrix, in the reordered matrix; storing, in a memory of the data processing system, data corresponding to the nodes in accordance with the new locations of nodes in the reordered matrix; and performing, by the processor, a matrix operation based on the reordered matrix and the lookup table at least by loading data corresponding to nodes in the reordered matrix into a cache memory of the data processing system, wherein the storage of the data corresponding to the nodes in accordance with the new locations of nodes in the reordered matrix minimizes cache misses in the cache memory when performing the matrix operation. - View Dependent Claims (2, 3, 4, 5, 6, 7, 8, 9, 10)
-
-
11. A computer program product comprising a computer readable storage medium having a computer readable program stored therein, wherein the computer readable program, when executed on a computing device, causes the computing device to:
-
configure the computing device to perform cluster-based matrix reordering of an input matrix; receive the input matrix, wherein the input matrix comprises nodes associated with elements of the matrix; cluster the nodes into clusters based on numbers of connections with other nodes within and between the clusters; order the clusters by minimizing a total length of cross cluster connections between nodes of the clusters, to thereby generate a reordered matrix; generate a lookup table identifying new locations of nodes of the input matrix, in the reordered matrix; store, in a memory of the computing device, data corresponding to the nodes in accordance with the new locations of nodes in the reordered matrix; and perform a matrix operation based on the reordered matrix and the lookup table at least by loading data corresponding to nodes in the reordered matrix into a cache memory of the computing device, wherein the storage of the data corresponding to the nodes in accordance with the new locations of nodes in the reordered matrix minimizes cache misses in the cache memory when performing the matrix operation. - View Dependent Claims (12, 13, 14, 15, 16, 17, 18, 19)
-
-
20. An apparatus comprising:
-
a processor; a cache coupled to the processor; and a memory coupled to the processor, wherein the memory comprises instructions which, when executed by the processor, configure the processor to perform cluster-based matrix reordering of an input matrix and to; receive the input matrix, wherein the input matrix comprises nodes associated with elements of the matrix; cluster the nodes into clusters based on numbers of connections with other nodes within and between the clusters; order the clusters by minimizing a total length of cross cluster connections between nodes of the clusters, to thereby generate a reordered matrix; generate a lookup table identifying new locations of nodes of the input matrix, in the reordered matrix; store, in the memory, data corresponding to the nodes in accordance with the new locations of nodes in the reordered matrix; and perform, by the processor, a matrix operation based on the reordered matrix and the lookup table at least by loading data corresponding to nodes in the reordered matrix into the cache, wherein the storage of the data corresponding to the nodes in accordance with the new locations of nodes in the reordered matrix minimizes cache misses in the cache when performing the matrix operation.
-
-
21. A Question and Answer (QA) system, comprising:
-
at least one processor; and an interface for accessing one or more storage devices that store a corpus of natural language content that is processed by the QA system to generate answers to received questions, wherein the at least one processor is configured to implement; a QA system pipeline that receives an input question and generates one or more answers to the input question at least by processing the corpus of natural language content; a matrix ordering engine coupled to the QA system pipeline, wherein the matrix reordering engine is configured to; receive an input matrix, wherein the input matrix comprises nodes associated with elements of the matrix, and wherein the nodes represent concepts found in the natural language content and the elements of the matrix represent connections between the concepts found in the natural language content; cluster the nodes into clusters based on numbers of connections with other nodes within and between the clusters; order the clusters by minimizing a total length of cross cluster connections between nodes of the clusters, to thereby generate a reordered matrix; generate a lookup table identifying new locations of nodes of the input matrix, in the reordered matrix; store, in the memory, data corresponding to the nodes in accordance with the new locations of nodes in the reordered matrix; and perform, by the processor, a matrix operation based on the reordered matrix and the lookup table at least by loading data corresponding to nodes in the reordered matrix into the cache, wherein the storage of the data corresponding to the nodes in accordance with the new locations of nodes in the reordered matrix minimizes cache misses in the cache when performing the matrix operation.
-
Specification