Efficient multiplication of small matrices using SIMD registers
First Claim
1. A matrix multiplication method, comprising:
- loading each diagonal of the multiplicand matrix c into processor accessible memory, loading a multiplier matrix a into processor accessible memory in column order, shifting elements in each column of multiplier matrix a in the register by shifting one element, with the last element of a column shifted to the front of the column, and multiplying diagonals of the multiplicand c matrix by columns of the multiplier a matrix, with their product being added to the sum of products for columns of a result matrix.
1 Assignment
0 Petitions
Accused Products
Abstract
An example of a matrix multiplication method that reduces calculation times on SIMD processors is described. The matrix multiplication requires loading each diagonal of the multiplicand matrix c into a different register of a processor, and loading a multiplier matrix a into at least one register in column order. Multiplication and addition elements in each column of multiplier matrix a in the register are selectively shifted to by shifting one element, with the last element of a column shifted to the front of the column. Diagonals of the multiplicand c matrix are multiplied by columns of the multiplier a matrix, with their product being added to the sum of products for columns of a result matrix.
-
Citations
30 Claims
-
1. A matrix multiplication method, comprising:
-
loading each diagonal of the multiplicand matrix c into processor accessible memory, loading a multiplier matrix a into processor accessible memory in column order, shifting elements in each column of multiplier matrix a in the register by shifting one element, with the last element of a column shifted to the front of the column, and multiplying diagonals of the multiplicand c matrix by columns of the multiplier a matrix, with their product being added to the sum of products for columns of a result matrix. - View Dependent Claims (2, 3, 4, 5, 6, 7, 8, 9, 10)
-
-
11. An article comprising a storage medium having stored thereon instructions that when executed by a machine result in:
-
loading each diagonal of the multiplicand matrix c into processor accessible memory, loading a multiplier matrix a into processor accessible memory in column order, shifting the elements in each column of multiplier matrix a in the register by shifting one element, with the last element of a column shifted to the front of the column, and multiplying diagonals of the multiplicand c matrix by columns of the multiplier a matrix, with their product being added to the sum of products for columns of a result matrix. - View Dependent Claims (12, 13, 14, 15, 16, 17, 18, 19, 20)
-
-
21. A system comprising
a processor having registers that load each diagonal of the multiplicand matrix c into processor accessible memory, with a multiplier matrix a loaded into processor accessible memory in column order, and control logic to shift the multiplication and addition elements in each column of multiplier matrix a in the registers by shifting one element, with the last element of a column shifted to the front of the column, and multiply diagonals of the multiplicand c matrix by columns of the multiplier a matrix, with their product being added to the sum of products for columns of a result matrix.
Specification