High speed and efficient matrix multiplication hardware module
First Claim
1. A matrix multiplication device for computing a multiplication of a first matrix with a second matrix, comprising:
- a first memory for storing data elements of the first matrix and a second memory for storing data elements of the second matrix;
a plurality of multiplier-accumulator units each of which comprises a multiplier circuit configured to multiply two data elements to produce a product value and an adder circuit configured to add the product value with an addend value to produce a result value; and
a storage unit comprising a plurality of storage locations;
wherein the plurality of multiplier-accumulator units are configured to perform the multiplication of the first matrix with the second matrix in N computation stages, where N is the number of rows of the first matrix and N is greater than 1, and for each integer j from 1 to N, to read data elements of the jth row of the first matrix from the first memory and data elements of the jth column of the second matrix from the second memory during a jth computation stage, and to use 2j−
1 of the plurality of multiplier-accumulator units during the jth computation stage to multiply data elements of the jth row of the first matrix by data elements of the jth column of the second matrix and by data elements of each column of the second matrix preceding the jth column and to multiply data elements of each row of the first matrix preceding the jth row by data elements of the jth column of the second matrix;
wherein the storage unit is configured to store data elements of the first and second matrices that are applied to a multiplier-accumulator unit during a computation stage for use in a subsequent computation stage.
3 Assignments
0 Petitions
Accused Products
Abstract
A matrix multiplication module and matrix multiplication method are provided that use a variable number of multiplier-accumulator units based on the amount of data elements of the matrices are available or needed for processing at a particular point or stage in the computation process. As more data elements become available or are needed, more multiplier-accumulator units are used to perform the necessary multiplication and addition operations. To multiply an N×M matrix by an M×N matrix, the total (maximum) number of used MAC units is “2*N−1”. The number of MAC units used starts with one (1) and increases by two at each computation stage, that is, at the beginning of reading of data elements for each new row of the first matrix. The sequence of the number of MAC units is {1, 3, 5, . . . , 2*N−1} for computation stages each of which corresponds to reading of data elements for each new row of the left hand matrix, also called the first matrix. For the multiplication of two 8×8 matrices, the performance is 16 floating point operations per clock cycle. For an FPGA running at 100 MHz, the performance is 1.6 Giga floating point operations per second. The performance increases with the increase of the clock frequency and the use of larger matrices when FPGA resources permit. Very large matrices are partitioned into smaller blocks to fit in the FPGA resources. Results from the multiplication of sub-matrices are combined to form the final result of the large matrices.
81 Citations
21 Claims
-
1. A matrix multiplication device for computing a multiplication of a first matrix with a second matrix, comprising:
-
a first memory for storing data elements of the first matrix and a second memory for storing data elements of the second matrix; a plurality of multiplier-accumulator units each of which comprises a multiplier circuit configured to multiply two data elements to produce a product value and an adder circuit configured to add the product value with an addend value to produce a result value; and a storage unit comprising a plurality of storage locations; wherein the plurality of multiplier-accumulator units are configured to perform the multiplication of the first matrix with the second matrix in N computation stages, where N is the number of rows of the first matrix and N is greater than 1, and for each integer j from 1 to N, to read data elements of the jth row of the first matrix from the first memory and data elements of the jth column of the second matrix from the second memory during a jth computation stage, and to use 2j−
1 of the plurality of multiplier-accumulator units during the jth computation stage to multiply data elements of the jth row of the first matrix by data elements of the jth column of the second matrix and by data elements of each column of the second matrix preceding the jth column and to multiply data elements of each row of the first matrix preceding the jth row by data elements of the jth column of the second matrix;wherein the storage unit is configured to store data elements of the first and second matrices that are applied to a multiplier-accumulator unit during a computation stage for use in a subsequent computation stage. - View Dependent Claims (2, 3, 4, 5, 6, 7, 8, 9)
-
-
10. A matrix multiplication hardware core device that multiplies first and second matrices, comprising:
-
a first memory for storing data elements of a first matrix and a second memory for storing data elements of a second matrix; a plurality of multiplier-accumulator units each of which is configured to multiply a first data element from the first matrix with a second data element from the second matrix to produce a product value and to add the product value with an addend value to produce a result value; and a storage unit comprising a plurality of registers that store data elements for one or more rows of the first matrix and for one or more columns of the second matrix for subsequent supply as input to a multiplier-accumulator unit; and wherein the plurality of multiplier-accumulator units are configured to perform the multiplication of the first matrix with the second matrix in N computation stages, where N is the number of rows of the first matrix and N is greater than 1, and for each integer j from 1 to N, to read data elements of the jth row of the first matrix from the first memory and data elements of the jth column of the second matrix from the second memory during a jth computation stage, and to use 2j−
1 of the plurality of multiplier-accumulator units during the jth computation stage to multiply data elements of the jth row of the first matrix by data elements of the jth column of the second matrix and by data elements of each column of the second matrix preceding the jth column and to multiply data elements of each row of the first matrix preceding the jth row by data elements of the jth column of the second matrix;wherein the storage unit is configured to store data elements of the first and second matrices that are applied to a multiplier-accumulator unit during a computation stage for use in a subsequent computation stage. - View Dependent Claims (11, 12, 13, 14, 15)
-
-
16. A method for multiplying first and second matrices, comprising:
-
storing data elements of a first matrix in a first memory and storing data elements of a second matrix in a second memory; providing a plurality of multiplier-accumulator units to multiply one data element of the first matrix with a data element of the second matrix to produce a product value and to add the product value with an addend value to produce a result value; performing the multiplication of the first matrix with the second matrix in N computation stages where N is the number of rows of the first matrix and N is greater than 1; wherein for each integer j from 1 to N, reading data elements of the jth row of the first matrix from the first memory and data elements of the jth column of the second matrix from the second memory during a jth computation stage; using 2j−
1 of the plurality of multiplier-accumulator units during the jth computation stage, multiplying data elements of the jth row of the first matrix by data elements of the jth column of the second matrix and by data elements of each column of the second matrix preceding the jth column, and multiplying data elements of each row of the first matrix preceding the jth row by data elements of the jth column of the second matrix; andstoring in a storage unit data elements of the first and second matrices that are applied to a multiplier-accumulator unit during a computation stage for use in a subsequent computation stage. - View Dependent Claims (17, 18, 19, 20, 21)
-
Specification