Matrix multiply with reduced bandwidth requirements
First Claim
Patent Images
1. A method of executing a set of operations including a broadcast operand for multiple threads or lanes, comprising:
- obtaining a first value specified by the broadcast operand included with the set of operations;
providing the first value to multiple program instruction execution units;
obtaining a set of second values specified by the parallel operand included with the set of operations, wherein each one of the second values corresponds to one of the multiple threads or lanes;
providing one second value of the set of second values to each one of the multiple program instruction execution units; and
executing the set of operations for each one of the multiple threads or lanes.
1 Assignment
0 Petitions
Accused Products
Abstract
Systems and methods for reducing the bandwidth needed to read the inputs to a matrix multiply operation may improve system performance. Rather than reading a row of a first input matrix and a column of a second input matrix to produce a column of a product matrix, a column of the first input matrix and a single element of the second input matrix are read to produce a column of partial dot products of the product matrix. Therefore, the number of input matrix elements read to produce each product matrix element is reduced from 2N to N+1, where N is the number of elements in a column of the product matrix.
48 Citations
20 Claims
-
1. A method of executing a set of operations including a broadcast operand for multiple threads or lanes, comprising:
-
obtaining a first value specified by the broadcast operand included with the set of operations;
providing the first value to multiple program instruction execution units;
obtaining a set of second values specified by the parallel operand included with the set of operations, wherein each one of the second values corresponds to one of the multiple threads or lanes;
providing one second value of the set of second values to each one of the multiple program instruction execution units; and
executing the set of operations for each one of the multiple threads or lanes. - View Dependent Claims (2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12)
-
-
13. A method of multiplying a first matrix and a first column of a second matrix to produce a first column of a product matrix, comprising:
-
multiplying each element of a first column of the first matrix by first element of the first column of the second matrix to produce a first group of elements corresponding to the first column of the product matrix;
storing the first group of elements corresponding to a column of the product matrix in a set of registers;
multiplying each element of a second column of the first matrix by a second element of the first column of the second matrix to produce a second group of elements corresponding to the first column of the product matrix;
summing each element of the stored group of elements with a corresponding element of the second group of elements to produce a group of product elements within the first column of the product matrix; and
storing the group of product elements in the set of registers. - View Dependent Claims (14, 15, 16)
-
-
17. A computer readable medium storing instructions for causing a processor to multiply a first matrix and a first column of a second matrix to produce a first column of a product matrix, by performing the steps of:
-
multiplying each element of a first column of the first matrix by first element of the first column of the second matrix to produce a first group of elements corresponding to the first column of the product matrix;
storing the first group of elements corresponding to a column of the product matrix in a set of registers;
multiplying each element of a second column of the first matrix by a second element of the first column of the second matrix to produce a second group of elements corresponding to the first column of the product matrix;
summing each element of the stored group of elements with a corresponding element of the second group of elements to produce a group of product elements within the first column of the product matrix; and
storing the group of product elements in the set of registers. - View Dependent Claims (18, 19, 20)
-
Specification