SM4 acceleration processors, methods, systems, and instructions
First Claim
Patent Images
1. A system on a chip (SoC) comprising:
- an integrated memory controller; and
a processor core coupled to the integrated memory controller, the processor core comprising;
a data cache;
a data translation lookaside buffer (TLB) coupled to the data cache;
a branch prediction unit;
an instruction cache;
an instruction TLB coupled to the instruction cache;
an instruction fetch unit to fetch instructions, including an instruction;
a level 2 (L2) cache coupled to the data cache, and coupled to the instruction cache;
a plurality of registers to store single instruction, multiple data (SIMD) data, including a first register, and a second register, the first register to store a first source data that includes four source data elements to be encrypted with an SM4 cryptographic algorithm, the second register to store a second source data that includes four round keys, wherein the plurality of registers are dynamically allocated using register renaming;
a decode unit to decode the instruction, the instruction having a first field to specify the first register, and a second field to specify the second register; and
an execution unit coupled to the decode unit, and coupled to the plurality of registers, the execution unit, in response to the decode of the instruction, to generate and store a result in the first register, the result to include four result data elements that include the first source data encrypted by four corresponding encryption rounds of the SM4 cryptographic algorithm, wherein the execution unit is to generate each of the four result data elements to be consistent with an evaluation of a linear substitution function with a value for the corresponding encryption round, which is equal to the value logically XOR'"'"'d with the value rotated left by two bits logically XOR'"'"'d with the value rotated left by ten bits logically XOR'"'"'d with the value rotated left by eighteen bits logically XOR'"'"'d with the value rotated left by twenty-four bits.
0 Assignments
0 Petitions
Accused Products
Abstract
A processor of an aspect includes a plurality of packed data registers, and a decode unit to decode an instruction. The instruction is to indicate one or more source packed data operands. The one or more source packed data operands are to have four 32-bit results of four prior SM4 cryptographic rounds, and four 32-bit values. The processor also includes an execution unit coupled with the decode unit and the plurality of the packed data registers. The execution unit, in response to the instruction, is to store four 32-bit results of four immediately subsequent and sequential SM4 cryptographic rounds in a destination storage location that is to be indicated by the instruction.
-
Citations
36 Claims
-
1. A system on a chip (SoC) comprising:
-
an integrated memory controller; and a processor core coupled to the integrated memory controller, the processor core comprising; a data cache; a data translation lookaside buffer (TLB) coupled to the data cache; a branch prediction unit; an instruction cache; an instruction TLB coupled to the instruction cache; an instruction fetch unit to fetch instructions, including an instruction; a level 2 (L2) cache coupled to the data cache, and coupled to the instruction cache; a plurality of registers to store single instruction, multiple data (SIMD) data, including a first register, and a second register, the first register to store a first source data that includes four source data elements to be encrypted with an SM4 cryptographic algorithm, the second register to store a second source data that includes four round keys, wherein the plurality of registers are dynamically allocated using register renaming; a decode unit to decode the instruction, the instruction having a first field to specify the first register, and a second field to specify the second register; and an execution unit coupled to the decode unit, and coupled to the plurality of registers, the execution unit, in response to the decode of the instruction, to generate and store a result in the first register, the result to include four result data elements that include the first source data encrypted by four corresponding encryption rounds of the SM4 cryptographic algorithm, wherein the execution unit is to generate each of the four result data elements to be consistent with an evaluation of a linear substitution function with a value for the corresponding encryption round, which is equal to the value logically XOR'"'"'d with the value rotated left by two bits logically XOR'"'"'d with the value rotated left by ten bits logically XOR'"'"'d with the value rotated left by eighteen bits logically XOR'"'"'d with the value rotated left by twenty-four bits. - View Dependent Claims (2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18)
-
-
19. A system on a chip (SoC) comprising:
-
an integrated memory controller; and a processor core coupled to the integrated memory controller, the processor core comprising; a plurality of registers to store single instruction, multiple data (SIMD) data, including a first register, and a second register, the first register to store a first source data that includes four source data elements to be encrypted with an SM4 cryptographic algorithm, the second register to store a second source data that includes four round keys, wherein the plurality of registers are dynamically allocated using register renaming; a decode unit to decode an instruction, the instruction having a first field to specify the first register, and a second field to specify the second register; and an execution unit coupled to the decode unit, and coupled to the plurality of registers, the execution unit, in response to the decode of the instruction, to generate and store a result in the first register, the result to include four result data elements that include the first source data encrypted by four corresponding encryption rounds of the SM4 cryptographic algorithm, wherein the execution unit is to generate each of the four result data elements to be consistent with an evaluation of a linear substitution function with a value for the corresponding encryption round, which is equal to the value logically XOR'"'"'d with the value rotated left by two bits logically XOR'"'"'d with the value rotated left by ten bits logically XOR'"'"'d with the value rotated left by eighteen bits logically XOR'"'"'d with the value rotated left by twenty-four bits. - View Dependent Claims (20, 21, 22, 23, 24, 25, 26, 27, 28, 29, 30, 31, 32)
-
-
33. A method performed by a system on a chip (SoC) comprising:
-
accessing data from a memory with an integrated memory controller of the SoC; dynamically allocating a plurality of registers that are used to store single instruction, multiple data (SIMD) data using register renaming; receiving a first source data including four source data elements, which are to be encrypted with an SM4 cryptographic algorithm, from a first register of the plurality of registers; receiving a second source data including four round keys from a second register of the plurality of registers; decoding an instruction having a first field specifying the first register, and a second field specifying the second register; generating a result, in response to the decode of the instruction, the result including four result data elements that include the first source data encrypted by four corresponding encryption rounds of the SM4 cryptographic algorithm, each of the four result data elements is generated to be consistent with an evaluation of a linear substitution function with a value for the corresponding encryption round, which is equal to the value logically XOR'"'"'d with the value rotated left by two bits logically XOR'"'"'d with the value rotated left by ten bits logically XOR'"'"'d with the value rotated left by eighteen bits logically XOR'"'"'d with the value rotated left by twenty-four bits; and storing the result in the first register in response to the decode of the instruction. - View Dependent Claims (34)
-
-
35. An article of manufacture comprising a non-transitory machine-readable storage medium, the non-transitory machine-readable storage medium storing instructions, wherein the instructions, if executed by a machine including a system on a chip (SoC), are to cause the machine to perform operations comprising to:
-
access data from a memory with an integrated memory controller of a SoC; receive a first source data including four source data elements, which are to be encrypted with an SM4 cryptographic algorithm, from a first register of a plurality of registers that are used to store single instruction, multiple data (SIMD) data; receive a second source data including four round keys from a second register of the plurality of registers; generate a result that is to include four result data elements that are to include the first source data encrypted by four corresponding encryption rounds of the SM4 cryptographic algorithm, wherein each of the four result data elements is to be generated to be consistent with an evaluation of a linear substitution function with a value for the corresponding encryption round, which is equal to the value logically XOR'"'"'d with the value rotated left by two bits logically XOR'"'"'d with the value rotated left by ten bits logically XOR'"'"'d with the value rotated left by eighteen bits logically XOR'"'"'d with the value rotated left by twenty-four bits; and store the result in the first register. - View Dependent Claims (36)
-
Specification