Apparatus and method of improved insert instructions
First Claim
1. An apparatus comprising:
- a decoder to decode a first instruction with a first immediate operand into a decoded first instruction, a second instruction with a second immediate operand into a decoded second instruction, a third instruction with a third immediate operand into a decoded third instruction, and a fourth instruction with a fourth immediate operand into a decoded fourth instruction;
instruction execution circuitry to execute;
a) the decoded first instruction and the decoded second instruction, where execution of said decoded first instruction is to insert a first group of input vector elements to one of multiple first non-overlapping sections of a first resultant vector and execution of said decoded second instruction is to insert the first group of input vector elements to one of multiple first non-overlapping sections of a second resultant vector, said first group having a first bit width, each of said multiple first non-overlapping sections having a same bit width as said first group, andb) the decoded third instruction and the decoded fourth instruction, where execution of said decoded third instruction is to insert a second group of input vector elements to one of multiple second non-overlapping sections of a third resultant vector and execution of said decoded fourth instruction is to insert the second group of input vector elements to one of multiple second non-overlapping sections of a fourth resultant vector, said second group having a second bit width that is larger than said first bit width, each of said multiple second non-overlapping sections having a same bit width as said second group; and
masking layer circuitry to mask the first resultant vector of said first instruction and the third resultant vector of the third instruction at a first resultant vector granularity specified by the first immediate operand and the third immediate operand, respectively, and mask the second resultant vector of said second instruction and the fourth resultant vector of the fourth instruction at a second resultant vector granularity specified by the second immediate operand and the fourth immediate operand, respectively.
1 Assignment
0 Petitions
Accused Products
Abstract
An apparatus is described having instruction execution logic circuitry to execute first, second, third and fourth instruction. Both the first instruction and the second instruction insert a first group of input vector elements to one of multiple first non overlapping sections of respective first and second resultant vectors. The first group has a first bit width. Each of the multiple first non overlapping sections have a same bit width as the first group. Both the third instruction and the fourth instruction insert a second group of input vector elements to one of multiple second non overlapping sections of respective third and fourth resultant vectors. The second group has a second bit width that is larger than said first bit width. Each of the multiple second non overlapping sections have a same bit width as the second group. The apparatus also includes masking layer circuitry to mask the first and third instructions at a first resultant vector granularity, and, mask the second and fourth instructions at a second resultant vector granularity.
-
Citations
20 Claims
-
1. An apparatus comprising:
-
a decoder to decode a first instruction with a first immediate operand into a decoded first instruction, a second instruction with a second immediate operand into a decoded second instruction, a third instruction with a third immediate operand into a decoded third instruction, and a fourth instruction with a fourth immediate operand into a decoded fourth instruction; instruction execution circuitry to execute; a) the decoded first instruction and the decoded second instruction, where execution of said decoded first instruction is to insert a first group of input vector elements to one of multiple first non-overlapping sections of a first resultant vector and execution of said decoded second instruction is to insert the first group of input vector elements to one of multiple first non-overlapping sections of a second resultant vector, said first group having a first bit width, each of said multiple first non-overlapping sections having a same bit width as said first group, and b) the decoded third instruction and the decoded fourth instruction, where execution of said decoded third instruction is to insert a second group of input vector elements to one of multiple second non-overlapping sections of a third resultant vector and execution of said decoded fourth instruction is to insert the second group of input vector elements to one of multiple second non-overlapping sections of a fourth resultant vector, said second group having a second bit width that is larger than said first bit width, each of said multiple second non-overlapping sections having a same bit width as said second group; and masking layer circuitry to mask the first resultant vector of said first instruction and the third resultant vector of the third instruction at a first resultant vector granularity specified by the first immediate operand and the third immediate operand, respectively, and mask the second resultant vector of said second instruction and the fourth resultant vector of the fourth instruction at a second resultant vector granularity specified by the second immediate operand and the fourth immediate operand, respectively. - View Dependent Claims (2, 3, 4, 5, 6, 7)
-
-
8. A method comprising:
-
decoding a first instruction into a decoded first instruction, a second instruction into a decoded second instruction, a third instruction into a decoded third instruction, and a fourth instruction into a decoded fourth instruction; executing the decoded first instruction including inserting a first group of input vector elements to one of multiple first non-overlapping sections of a first resultant vector, said first group having a first bit width, each of said multiple first non-overlapping sections having a same bit width as said first group, and masking said first group at a first granularity specified by a first immediate operand of the first instruction; executing the decoded second instruction including inserting a second group of input vector elements to one of multiple second non-overlapping sections of a second resultant vector, said second group having a second bit width, each of said multiple second non-overlapping sections having a same bit width as said second group, and masking said second group at a second granularity specified by a second immediate operand of the second instruction, said first granularity being finer than said second granularity; executing the decoded third instruction including inserting a third group of input vector elements to one of multiple third non-overlapping sections of a third resultant vector, said third group having said first bit width, each of said multiple third non-overlapping sections having a same bit width as said first group, and masking said third group at said second granularity specified by a third immediate operand of the third instruction; and executing the decoded fourth instruction including inserting a fourth group of input vector elements to one of multiple fourth non-overlapping sections of a fourth resultant vector, said fourth group having said second bit width, each of said multiple fourth non-overlapping sections having a same bit width as said second group, and masking said fourth group at said first granularity specified by a fourth immediate operand of the fourth instruction. - View Dependent Claims (9, 10, 11, 12, 13, 14, 15)
-
-
16. An apparatus comprising:
-
a decoder to decode a first instruction into a decoded first instruction, a second instruction into a decoded second instruction, a third instruction into a decoded third instruction, and a fourth instruction into a decoded fourth instruction; instruction execution circuitry to execute; a) the decoded first instruction and the decoded second instruction, where execution of said decoded first instruction is to insert a first group of input vector elements to one of multiple first non-overlapping sections of a first resultant vector in accordance with a first immediate operand and execution of said decoded second instruction is to insert the first group of input vector elements to one of multiple first non-overlapping sections of a second resultant vector in accordance with a second immediate operand, said first group having a first bit width, each of said multiple first non-overlapping sections having a same bit width as said first group, and b) the decoded third instruction and the decoded fourth instruction, where execution of said decoded third instruction is to insert a second group of input vector elements to one of multiple second non-overlapping sections of a third resultant vector in according with a third immediate operand and execution of said decoded fourth instruction is to insert the second group of input vector elements to one of multiple second non-overlapping sections of a fourth resultant vector in accordance with a fourth immediate operand, said second group having a second bit width that is larger than said first bit width, each of said multiple second non-overlapping sections having a same bit width as said second group; and masking layer circuitry to mask the first resultant vector of said first instruction and the third resultant vector of the third instruction at a first resultant vector granularity specified by a fifth immediate operand of the first instruction and a sixth immediate operand of the third instruction, respectively, and mask the second resultant vector of said second instruction and the fourth resultant vector of the fourth instruction at a second resultant vector granularity specified by a seventh immediate operand of the second instruction and an eighth immediate operand of the fourth instruction, respectively. - View Dependent Claims (17, 18, 19, 20)
-
Specification