Rearranging data between vector and matrix forms in a SIMD matrix processor
First Claim
1. A group of instructions in a Matrix Processor that rearranges data between vector and matrix forms of an A×
- B matrix of data where the data matrix includes one or more 4×
4 sub-matrices of data, comprising;
16 processing elements where an individual processing element (PE) comprises one or more PE register entries in a PE register file;
a mesh row column interconnect that couples said processing elements into a 4×
4 matrix processing array;
a first, second, third, and fourth matrix register wherein an individual matrix register comprises an individual PE register entry from each said PE register file from each said individual processing element that are then combined together to from said individual matrix register;
wherein said first, second, third, and fourth matrix registers simultaneously swaps row or columns between said first, second, third, and fourth matrix registers according to the instructions that perform predefined matrix tensor operations on the data matrix that includes one of the following group of operations;
swapping rows between said first, second, third, and fourth matrix registers, or swapping columns between said first, second, third, and fourth matrix registers.
7 Assignments
0 Petitions
Accused Products
Abstract
This invention discloses a group of instructions, block4 and block4v, in a matrix processor 16 that rearranges data between vector and matrix forms of an A×B matrix of data 120 where the data matrix includes one or more 4×4 sub-matrices of data 160-166. The instructions of this invention simultaneously swaps row or columns between the first 140, second 142, third 144, and fourth 146 matrix registers according to the instructions that perform predefined matrix tensor operations on the data matrix that includes one of the following group of operations: swapping rows between the different individual matrix registers, or swapping columns between the different individual matrix registers. Additionally, successive iterations or combinations of the block4 and or block4v instructions perform standard tensor matrix operations from the following group of matrix operations: transpose, shuffle, and deal.
-
Citations
13 Claims
-
1. A group of instructions in a Matrix Processor that rearranges data between vector and matrix forms of an A×
- B matrix of data where the data matrix includes one or more 4×
4 sub-matrices of data, comprising;16 processing elements where an individual processing element (PE) comprises one or more PE register entries in a PE register file;
a mesh row column interconnect that couples said processing elements into a 4×
4 matrix processing array;
a first, second, third, and fourth matrix register wherein an individual matrix register comprises an individual PE register entry from each said PE register file from each said individual processing element that are then combined together to from said individual matrix register;
wherein said first, second, third, and fourth matrix registers simultaneously swaps row or columns between said first, second, third, and fourth matrix registers according to the instructions that perform predefined matrix tensor operations on the data matrix that includes one of the following group of operations;
swapping rows between said first, second, third, and fourth matrix registers, or swapping columns between said first, second, third, and fourth matrix registers. - View Dependent Claims (6, 7)
- B matrix of data where the data matrix includes one or more 4×
-
2. A Matrix Processor that includes instructions that rearranges data between vector and matrix forms of an A×
- B matrix of data where the data matrix includes one or more 4×
4 sub-matrices of data, comprising;16 processing elements where an individual processing element (PE) comprises one or more PE register entries in a PE register file;
a mesh row column interconnect that couples said processing elements into a 4×
4 matrix processing array;
a first, second, third, and fourth matrix register wherein an individual matrix register comprises an individual PE register entry from each said PE register file from each said individual processing element that are then combined together to from said individual matrix register;
wherein said first, second, third, and fourth matrix registers simultaneously swaps row or columns between said first, second, third, and fourth matrix registers according to the instructions that perform predefined matrix tensor operations on the data matrix that includes one of the following group of operations;
swapping rows between said first, second, third, and fourth matrix registers, or swapping columns between said first, second, third, and fourth matrix registers.
- B matrix of data where the data matrix includes one or more 4×
-
3. A system that includes a Matrix Processor with instructions that rearranges data between vector and matrix forms of an A×
- B matrix of data where the data matrix includes one or more 4×
4 sub-matrices of data, comprising;16 processing elements where an individual processing element (PE) comprises one or more PE register entries in a PE register file;
a mesh row column interconnect that couples said processing elements into a 4×
4 matrix processing array;
a first, second, third, and fourth matrix register wherein an individual matrix register comprises an individual PE register entry from each said PE register file from each said individual processing element that are then combined together to from said individual matrix register;
wherein said first, second, third, and fourth matrix registers simultaneously swaps row or columns between different said first, second, third, and fourth registers according to the instructions that perform predefined matrix tensor operations on the data matrix that includes one of the following group of operations;
swapping rows between said first, second, third, and fourth matrix registers, or swapping columns between said first, second, third, and fourth matrix registers.
- B matrix of data where the data matrix includes one or more 4×
-
4. A method to make a Matrix Processor that includes instructions that rearranges data between vector and matrix forms of an A×
- B matrix of data where the data matrix includes one or more 4×
4 sub-matrices of data, comprising;providing 16 processing elements where an individual processing element (PE) comprises one or more PE register entries in a PE register file;
coupling said processing elements into a 4×
4 matrix processing array with a mesh row column interconnect;
providing a first, second, third, and fourth matrix register wherein an individual matrix register comprises an individual PE register entry from each said PE register file from each said individual processing element that are then combined together to from said individual matrix register;
wherein said first, second, third, and fourth matrix registers simultaneously swaps row or columns between said first, second, third, and fourth matrix registers according to the instructions that perform predefined matrix tensor operations on the data matrix that includes one of the following group of operations;
swapping rows between said first, second, third, and fourth matrix registers, or swapping columns between said first, second, third, and fourth matrix registers.
- B matrix of data where the data matrix includes one or more 4×
-
5. A method to use instructions in a Matrix Processor that rearranges data between vector and matrix forms of an A×
- B matrix of data where the data matrix includes one or more 4×
4 sub-matrices of data, comprising;providing 16 processing elements where an individual processing element (PE) comprises one or more PE register entries in a PE register file;
providing a mesh row column interconnect that couples said processing elements into a 4×
4 matrix processing array;
providing a first, second, third, and fourth matrix register wherein an individual matrix register comprises an individual PE register entry from each said PE register file from each said individual processing element that are then combined together to from said individual matrix register; and
simultaneously swapping row or columns between said first, second, third, and fourth matrix registers according to the instructions that perform predefined matrix tensor operations on the data matrix that includes one of the following group of operations;
swapping rows between said first, second, third, and fourth matrix registers, or swapping columns between said first, second, third, and fourth matrix registers.
- B matrix of data where the data matrix includes one or more 4×
-
8. A group of instructions in a Matrix Processor that rearranges data between vector and matrix forms of an A×
- B matrix of data where the data matrix includes one or more 4×
4 sub-matrices of data, comprising;16 processing elements where an individual processing element (PE) comprises 16 PE register entries in a PE register file;
a mesh row column interconnect that couples said processing elements into a 4×
4 matrix processing array;
16 matrix registers wherein an individual matrix register comprises an individual PE register entry from each said PE register file from each said individual processing element that are then combined together to from said individual matrix register;
wherein a group of said 16 matrix registers comprises a first, second, third, and fourth matrix register of said 16 matrix registers that simultaneously swaps row or columns between said first, second, third, and fourth matrix registers of said group of matrix registers according to the instructions that perform predefined matrix tensor operations on the data matrix that includes one of the following group of operations;
swapping rows between said first, second, third, and fourth matrix register of said group of matrix registers, or swapping columns between said first, second, third, and fourth matrix register of said group of matrix registers; and
wherein the swapping of rows or columns converts the data in the data matrix into one of the following matrix data orders;
4 vectors of the larger data matrix to a 4×
4 data sub-matrix in row major order, and 4 vectors of the larger data matrix to a 4×
4 data sub-matrix in column major order.
- B matrix of data where the data matrix includes one or more 4×
-
9. A Matrix Processor that includes instructions that rearranges data between vector and matrix forms of an A×
- B matrix of data where the data matrix includes one or more 4×
4 sub-matrices of data, comprising;16 processing elements where an individual processing element (PE) comprises one or more PE register entries in a PE register file;
a mesh row column interconnect that couples said processing elements into a 4×
4 matrix processing array;
16 matrix registers wherein an individual matrix register comprises an individual PE register entry from each said PE register file from each said individual processing element that are then combined together to from said individual matrix register;
wherein a group of said 16 matrix registers comprises a first, second, third, and fourth matrix register of said 16 matrix registers that simultaneously swaps row or columns between said first, second, third, and fourth matrix register of said group of matrix registers according to the instructions that perform predefined matrix tensor operations on the data matrix that includes one of the following group of operations;
swapping rows between said first, second, third, and fourth matrix register of said group of matrix registers, or swapping columns between said first, second, third, and fourth matrix register of said group of matrix registers; and
wherein the swapping of rows or columns converts the data in the data matrix into one of the following matrix data orders;
4 vectors of the larger data matrix to a 4×
4 data sub-matrix in row major order, and 4 vectors of the larger data matrix to a 4×
4 data sub-matrix in column major order.
- B matrix of data where the data matrix includes one or more 4×
-
10. A system that includes a Matrix Processor with instructions that rearranges data between vector and matrix forms of an A×
- B matrix of data where the data matrix includes one or more 4×
4 sub-matrices of data, comprising;16 processing elements where an individual processing element (PE) comprises one or more PE register entries in a PE register file;
a mesh row column interconnect that couples said processing elements into a 4×
4 matrix processing array;
16 matrix registers wherein an individual matrix register comprises an individual PE register entry from each said PE register file from each said individual processing element that are then combined together to from said individual matrix register;
wherein a group of said 16 matrix registers comprises a first, second, third, and fourth matrix register that simultaneously swaps row or columns between said first, second, third, and fourth matrix register of said group of matrix registers according to the instructions that perform predefined matrix tensor operations on the data matrix that includes one of the following group of operations;
swapping rows between said first, second, third, and fourth matrix register of said group of matrix registers, or swapping columns between said first, second, third, and fourth matrix register of said group of matrix registers; and
wherein the swapping of rows or columns converts the data in the data matrix into one of the following matrix data orders;
4 vectors of the larger data matrix to a 4×
4 data sub-matrix in row major order, and 4 vectors of the larger data matrix to a 4×
4 data sub-matrix in column major order. - View Dependent Claims (13)
- B matrix of data where the data matrix includes one or more 4×
-
11. A method to make a Matrix Processor that includes instructions that rearranges data between vector and matrix forms of an A×
- B matrix of data where the data matrix includes one or more 4×
4 sub-matrices of data, comprising;providing 16 processing elements where an individual processing element (PE) comprises one or more PE register entries in a PE register file;
coupling said processing elements into a 4×
4 matrix processing array with a mesh row column interconnect;
providing 16 matrix registers wherein an individual matrix register comprises an individual PE register entry from each said PE register file from each said individual processing element that are then combined together to from said individual matrix register;
wherein a group of said 16 matrix registers comprises a first, second, third, and fourth matrix register that simultaneously swaps row or columns between said first, second, third, and fourth matrix register of said group of matrix registers according to the instructions that perform predefined matrix tensor operations on the data matrix that includes one of the following group of operations;
swapping rows between said first, second, third, and fourth matrix register of said group of matrix registers, or swapping columns between said first, second, third, and fourth matrix register of said group of matrix registers; and
wherein the swapping of rows or columns converts the data in the data matrix into one of the following matrix data orders;
4 vectors of the larger data matrix to a 4×
4 data sub-matrix in row major order, and 4 vectors of the larger data matrix to a 4×
4 data sub-matrix in column major order.
- B matrix of data where the data matrix includes one or more 4×
-
12. A method to use instructions in a Matrix Processor that rearranges data between vector and matrix forms of an A×
- B matrix of data where the data matrix includes one or more 4×
4 sub-matrices of data, comprising;providing 16 processing elements where an individual processing element (PE) comprises one or more PE register entries in a PE register file;
providing a mesh row column interconnect that couples said processing elements into a 4×
4 matrix processing array;
providing 16 matrix registers wherein an individual matrix register comprises an individual PE register entry from each said PE register file from each said individual processing element that are then combined together to from said individual matrix register; and
simultaneously swapping row or columns between a group of said 16 matrix registers that comprise a first, second, third, and fourth matrix register according to the instructions that perform predefined matrix tensor operations on the data matrix that includes one of the following group of operations;
swapping rows between said first, second, third, and fourth matrix register of said group of matrix registers, or swapping columns between said first, second, third, and fourth matrix register of said group of matrix registers;
wherein the swapping of rows or columns converts the data in the data matrix into one of the following matrix data orders;
4 vectors of the larger data matrix to a 4×
4 data sub-matrix in row major order, and 4 vectors of the larger data matrix to a 4×
4 data sub-matrix in column major order.
- B matrix of data where the data matrix includes one or more 4×
Specification