DEVICE AND METHOD FOR ACCELERATING MATRIX MULTIPLY OPERATIONS
First Claim
1. A processing device comprising:
- memory configured to store data; and
a plurality of processor cores in communication with each other via first hierarchical communication links and second hierarchical communication links, each processor core in a group of the plurality of processor cores being in communication with each other via the first hierarchical communication links and configured to;
store, in the memory, one of a plurality of sub-portions of data of a first matrix;
store, in the memory, one of a plurality of sub-portions of data of a second matrix;
determine a product of the one sub-portion of data of the first matrix and the one sub-portion of data of the second matrix;
receive, from another processor core of the group of processor cores, another of the sub-portions of data of the second matrix; and
determine a product of the one sub-portion of data of the first matrix and the other sub-portion of data of the second matrix.
1 Assignment
0 Petitions
Accused Products
Abstract
A processing device is provided which comprises memory configured to store data and a plurality of processor cores in communication with each other via first and second hierarchical communication links. Processor cores of a first hierarchical processor core group are in communication with each other via the first hierarchical communication links and are configured to store, in the memory, a sub-portion of data of a first matrix and a sub-portion of data of a second matrix. The processor cores are also configured to determine a product of the sub-portion of data of the first matrix and the sub-portion of data of the second matrix, receive, from another processor core, another sub-portion of data of the second matrix and determine a product of the sub-portion of data of the first matrix and the other sub-portion of data of the second matrix.
2 Citations
20 Claims
-
1. A processing device comprising:
-
memory configured to store data; and a plurality of processor cores in communication with each other via first hierarchical communication links and second hierarchical communication links, each processor core in a group of the plurality of processor cores being in communication with each other via the first hierarchical communication links and configured to; store, in the memory, one of a plurality of sub-portions of data of a first matrix; store, in the memory, one of a plurality of sub-portions of data of a second matrix; determine a product of the one sub-portion of data of the first matrix and the one sub-portion of data of the second matrix; receive, from another processor core of the group of processor cores, another of the sub-portions of data of the second matrix; and determine a product of the one sub-portion of data of the first matrix and the other sub-portion of data of the second matrix. - View Dependent Claims (2, 3, 4, 5, 6, 7)
-
-
8. A processing device comprising:
-
memory configured to store data; and a plurality of processor cores in communication with each other via first hierarchical communication links, the plurality of processor cores comprising a first processor core and a second processor core, wherein the first processor core is configured to; determine a product of a first sub-portion of data of a first matrix received from the memory and a first sub-portion of data of a second matrix received from the memory; and communicate, to the second processor core via one of the first hierarchical communication links, the first sub-portion of data of the second matrix; and the second processor core is configured to; receive the first sub-portion of data of the second matrix communicated by the first processor without accessing the memory; and determine a product of the first sub-portion of data of the second matrix received from the first processor and a second sub-portion of data of the first matrix received from the memory. - View Dependent Claims (9, 10, 11, 12, 13, 14)
-
-
15. A method for use in a processing device having a plurality of processor cores for performing matrix multiplication, the method comprising:
-
receiving, from memory by a first processor core, a first sub-portion of data of a first matrix; receiving, from the memory by the first processor core, a first sub-portion of data of a second matrix; determining, by the first processor core, a product of the first sub-portion of data of the first matrix and the first sub-portion of data of the second matrix; communicating, by the first processor core to a second processor core, the first sub-portion of data of the second matrix via one of a plurality of first hierarchical communication links; receiving, from the memory by the second processor core, a second sub-portion of data of the first matrix; and determining, by the second processor core, a product of the second sub-portion of data of the first matrix and the first sub-portion of data of the second matrix. - View Dependent Claims (16, 17, 18, 19, 20)
-
Specification