High Speed and Efficient Matrix Multiplication Hardware Module
First Claim
1. A matrix multiplication device, comprising a plurality of multiplier-accumulator units each of which comprises a multiplier circuit that multiplies two data elements to produce a product value and an adder circuit that adds the product value with an addend value to produce a result value, wherein a number of multiplier-accumulator units used when a multiplication computation is performed on first and second matrices varies depending on which computation stage corresponding to which row of the first matrix the multiplication computation is executing.
3 Assignments
0 Petitions
Accused Products
Abstract
A matrix multiplication module and matrix multiplication method are provided that use a variable number of multiplier-accumulator units based on the amount of data elements of the matrices are available or needed for processing at a particular point or stage in the computation process. As more data elements become available or are needed, more multiplier-accumulator units are used to perform the necessary multiplication and addition operations. To multiply an N×M matrix by an M×N matrix, the total (maximum) number of used MAC units is “2*N−1”. The number of MAC units used starts with one (1) and increases by two at each computation stage, that is, at the beginning of reading of data elements for each new row of the first matrix. The sequence of the number of MAC units is {1, 3, 5, . . . , 2*N−1} for computation stages each of which corresponds to reading of data elements for each new row of the left hand matrix, also called the first matrix. For the multiplication of two 8×8 matrices, the performance is 16 floating point operations per clock cycle. For an FPGA running at 100 MHz, the performance is 1.6 Giga floating point operations per second. The performance increases with the increase of the clock frequency and the use of larger matrices when FPGA resources permit. Very large matrices are partitioned into smaller blocks to fit in the FPGA resources. Results from the multiplication of sub-matrices are combined to form the final result of the large matrices.
55 Citations
23 Claims
- 1. A matrix multiplication device, comprising a plurality of multiplier-accumulator units each of which comprises a multiplier circuit that multiplies two data elements to produce a product value and an adder circuit that adds the product value with an addend value to produce a result value, wherein a number of multiplier-accumulator units used when a multiplication computation is performed on first and second matrices varies depending on which computation stage corresponding to which row of the first matrix the multiplication computation is executing.
-
12. A matrix multiplication hardware core that multiplies first and second matrices, comprising:
-
a. a plurality of multiplier-accumulator units each of which multiplies a first data element from the first matrix with a second data element from the second matrix to produce a product value and adds the product value with an addend value to produce a result value, wherein a number of multiplier-accumulator units used when multiplying the first and second matrices increases as computations progress from a first row of the first matrix to remaining rows of the first matrix; and b. a storage unit comprising a plurality of registers that store data elements for one or more rows of the first matrix and for one or more columns of the second matrix for subsequent supply as input to a multiplier-accumulator unit. - View Dependent Claims (13, 14, 15, 16, 17)
-
-
18. A method for multiplying first and second matrices, comprising:
-
a. providing a plurality of multiplier-accumulator units, each of which is capable of multiplying one data element of the first matrix with a data element of the second matrix to produce a product value and adding the product value with an addend value to produce a result value; and b. using a number of said plurality of multiplier-accumulator units that increases as computation progresses from a first row of the first matrix to remaining rows of the first matrix. - View Dependent Claims (19, 20, 21, 22, 23)
-
Specification