APPARATUS AND METHOD OF IMPROVED INSERT INSTRUCTIONS
First Claim
1. A processor comprising:
- a plurality of vector registers including a source vector register and a destination vector register;
instruction decode circuitry to decode an insert instruction, the insert instruction including a vector extension component and an immediate, the vector extension component comprising;
a first byte field to indicate a format of the insert instruction,a second byte field to identify one or more subsets of the plurality of vector registers, the one or more subsets including the source vector register and the destination vector register, and to identify a corresponding opcode map for the insert instruction, anda third byte field to indicate a packed data element length and to specify a portion of an opcode encoding corresponding to the insert instruction; and
an execution circuit to perform operations specified by the insert instruction, wherein following the instruction decode circuitry decoding the insert instruction, the execution circuit is to read 128 bits of data from the source vector register and insert the 128 bits of data into a specified position in the destination vector register,wherein the specified position in which to insert the 128 bits of data is selected based on the immediate.
0 Assignments
0 Petitions
Accused Products
Abstract
An apparatus is described having instruction execution logic circuitry to execute first, second, third and fourth instruction. Both the first instruction and the second instruction insert a first group of input vector elements to one of multiple first non overlapping sections of respective first and second resultant vectors. The first group has a first bit width. Each of the multiple first non overlapping sections have a same bit width as the first group. Both the third instruction and the fourth instruction insert a second group of input vector elements to one of multiple second non overlapping sections of respective third and fourth resultant vectors. The second group has a second bit width that is larger than said first bit width. Each of the multiple second non overlapping sections have a same bit width as the second group. The apparatus also includes masking layer circuitry to mask the first and third instructions at a first resultant vector granularity, and, mask the second and fourth instructions at a second resultant vector granularity.
-
Citations
30 Claims
-
1. A processor comprising:
-
a plurality of vector registers including a source vector register and a destination vector register; instruction decode circuitry to decode an insert instruction, the insert instruction including a vector extension component and an immediate, the vector extension component comprising; a first byte field to indicate a format of the insert instruction, a second byte field to identify one or more subsets of the plurality of vector registers, the one or more subsets including the source vector register and the destination vector register, and to identify a corresponding opcode map for the insert instruction, and a third byte field to indicate a packed data element length and to specify a portion of an opcode encoding corresponding to the insert instruction; and an execution circuit to perform operations specified by the insert instruction, wherein following the instruction decode circuitry decoding the insert instruction, the execution circuit is to read 128 bits of data from the source vector register and insert the 128 bits of data into a specified position in the destination vector register, wherein the specified position in which to insert the 128 bits of data is selected based on the immediate. - View Dependent Claims (2, 3, 4, 5, 6, 7, 8, 9, 10)
-
-
11. A method comprising:
-
decoding an insert instruction by instruction decode circuitry, the decoding to cause a plurality of vector registers to be accessed including a source vector register and a destination vector register; the insert instruction including a vector extension component and an immediate, the vector extension component comprising; a first byte field to indicate a format of the insert instruction, a second byte field to identify one or more subsets of the plurality of vector registers, the one or more subsets including a source vector register and a destination vector register, and to identify a corresponding opcode map for the insert instruction, and a third byte field to indicate a packed data element length and to specify a portion of an opcode encoding corresponding to the insert instruction; and performing operations specified by the insert instruction following the decoding of the insert instruction, the operations including reading 128 bits of data from the source vector register and inserting the 128 bits of data into a specified position in the destination vector register, wherein the specified position in which to insert the 128 bits of data is selected based on the immediate. - View Dependent Claims (12, 13, 14, 15, 16, 17, 18, 19, 20)
-
-
21. A non-transitory machine-readable medium comprising program code which, when executed by a machine, causes the machine to perform operations of:
-
decoding an insert instruction by instruction decode circuitry, the decoding to cause a plurality of vector registers to be accessed including a source vector register and a destination vector register; the insert instruction including a vector extension component and an immediate, the vector extension component comprising; a first byte field to indicate a format of the insert instruction, a second byte field to identify one or more subsets of the plurality of vector registers, the one or more subsets including a source vector register and a destination vector register, and to identify a corresponding opcode map for the insert instruction, and a third byte field to indicate a packed data element length and to specify a portion of an opcode encoding corresponding to the insert instruction; and performing operations specified by the insert instruction following the decoding of the insert instruction, the operations including reading 128 bits of data from the source vector register and inserting the 128 bits of data into a specified position in the destination vector register, wherein the specified position in which to insert the 128 bits of data is selected based on the immediate. - View Dependent Claims (22, 23, 24, 25, 26, 27, 28, 29, 30)
-
Specification