INSTRUCTIONS FOR DUAL DESTINATION TYPE CONVERSION, MIXED PRECISION ACCUMULATION, AND MIXED PRECISION ATOMIC MEMORY OPERATIONS

US 20180321937A1
Filed: 05/03/2017
Published: 11/08/2018
Est. Priority Date: 05/03/2017
Status: Active Grant

First Claim

Patent Images

1. A system used to execute an instruction, the system comprising:

a memory;

a processor comprising;

a fetch circuit to fetch the instruction from a code storage, the instruction comprising an opcode, a first destination identifier, and a source identifier to specify a source vector register, the source vector register comprising a plurality of single precision floating point data elements;

a decode circuit to decode the fetched instruction; and

an execution circuit to execute the decoded instruction to;

convert the elements of the source vector register into double precision floating point values, store a first half of the double precision floating point values to a first location identified by the first destination identifier, and store a second half of the double precision floating point values to a second location.

View all claims

1 Assignment

Timeline View

Assignment View

0 Petitions

Accused Products

Abstract

Disclosed embodiments relate to instructions for dual-destination type conversion, accumulation, and atomic memory operations. In one example, a system includes a memory, a processor including: a fetch circuit to fetch the instruction from a code storage, the instruction including an opcode, a first destination identifier, and a source identifier to specify a source vector register, the source vector register including a plurality of single precision floating point data elements, a decode circuit to decode the fetched instruction, and an execution circuit to execute the decoded instruction to: convert the elements of the source vector register into double precision floating point values, store a first half of the double precision floating point values to a first location identified by the first destination identifier, and store a second half of the double precision floating point values to a second location.

9 Citations

20 Claims

1. A system used to execute an instruction, the system comprising:
- a memory;
  
  a processor comprising;
  
  a fetch circuit to fetch the instruction from a code storage, the instruction comprising an opcode, a first destination identifier, and a source identifier to specify a source vector register, the source vector register comprising a plurality of single precision floating point data elements;
  
  a decode circuit to decode the fetched instruction; and
  
  an execution circuit to execute the decoded instruction to;
  
  convert the elements of the source vector register into double precision floating point values, store a first half of the double precision floating point values to a first location identified by the first destination identifier, and store a second half of the double precision floating point values to a second location.
- View Dependent Claims (2, 3, 4, 5, 6, 7, 8, 9, 10)
- - 2. The system of claim 1, wherein the instruction further comprises a second destination identifier, and wherein the second location is identified by the second destination identifier.
  - 3. The system of claim 1, wherein the second location is the source vector register.
  - 4. The system of claim 3, wherein the source vector register, the first destination vector register, and the second destination vector register are 512-bit vector registers.
  - 5. The system of claim 1, wherein the execution circuit is further to add each vector element of the first half of the double precision floating point values to data previously stored in the first location and to store a first sum to the first location, and to add each vector element of the second half of the double precision floating point values to data previously stored in the second location and to store a second sum to the second location.
  - 6. The system of claim 1, wherein the locations identified by the first destination identifier and the second destination identifier are in the memory.
  - 7. The system of claim 6, wherein the execution circuit is further to:
    - perform a first atomic read-modify-write to read first data stored in the first location, add the first half of the double precision floating point values to the first data, and store double precision floating point sums to the first location; and
      
      perform a second atomic read-modify-write to read second data stored in the second location, add the second half of the double precision floating point values to the second data, and store double precision floating point sums to the second location.
  - 8. The system of claim 1, wherein the execution circuit is to convert all elements of the source vector register in parallel.
  - 9. The system of claim 1, wherein the opcode is to specify that only a lower half of the source vector register is to be converted and stored to the first location.
  - 10. The system of claim 1, wherein the opcode is to specify that only an upper half of the source vector register is to be converted and stored to the first location.

11. A method of executing an instruction, the method comprising:
- fetching the instruction from a code storage, the instruction comprising an opcode, a first destination identifier, and a source identifier to specify a source vector register comprising a plurality of single precision floating point data elements;
  
  decoding the fetched instruction by a decode circuit; and
  
  executing, by an execution circuit, the decoded instruction to;
  
  convert the elements of the source vector register into double precision floating point values, store a first half of the double precision floating point values to a first location identified by the first destination identifier, and store a second half of the double precision values to a second location.
- View Dependent Claims (12, 13, 14, 15, 16, 17)
- - 12. The method of claim 11, wherein the instruction further comprises a second destination identifier, and wherein the second location is identified by the second destination identifier.
  - 13. The method of claim 11, wherein the second location is the source vector register.
  - 14. The method of claim 13, wherein the source vector register, the first destination vector register, and the second destination vector register are 512-bit vector registers.
  - 15. The method of claim 11, further comprising:
    - adding, by the execution circuit, each of the first half of the double precision floating point values to data previously stored in the first location, and adding each of the second half of the double precision floating point values to data previously stored in the second location.
  - 16. The method of claim 11, wherein the locations identified by the first destination identifier and the second destination identifier are in the memory.
  - 17. The method of claim 16, further comprising accumulating results in the first location and the second location by:
    - performing a first atomic read-modify-write to read first data stored in the first location, add the first half of the double precision floating point values to the first data, and store double precision floating point results to the first location; and
      
      performing a second atomic read-modify-write to read second data stored in the second location, add the second half of the double precision floating point values to the second data, and store double precision floating point results to the second location.

18. An apparatus for executing an instruction, the apparatus comprising:
- means for fetching an instruction, the means for fetching to fetch the instruction from a code storage, the instruction comprising an opcode, a first destination identifier, and a source identifier to specify a source vector register, the source vector register comprising a plurality of single precision floating point data elements;
  
  means for decoding to decode the fetched instruction; and
  
  means for executing the decoded instruction to;
  
  convert the elements of the source vector register into double precision floating point values, store a first half of the double precision floating point values to a first location identified by the first destination identifier, and store a second half of the double precision floating point values to a second location.
- View Dependent Claims (19, 20)
- - 19. The apparatus of claim 18, wherein the instruction further comprises a second destination identifier, and wherein the second location is identified by the second destination identifier.
  - 20. The apparatus of claim 18, wherein the second location is the source vector register.

Specification

Resources

Litigation Campaign Assessment

Current Assignee
Intel Corporation
Original Assignee
Intel Corporation
Inventors
Brown, William M., Raman, Karthik

Granted Patent

US 10,698,685 B2
Time in Patent Office

Days
Field of Search
US Class Current
CPC Class Codes

G06F 9/30014   with variable precision

G06F 9/30025   Format conversion instructi...

G06F 9/30036   Instructions to perform ope...

G06F 9/3016   Decoding the operand specif...

G06F 9/3802   Instruction prefetching

G06F 9/3887   controlled by a single inst...

INSTRUCTIONS FOR DUAL DESTINATION TYPE CONVERSION, MIXED PRECISION ACCUMULATION, AND MIXED PRECISION ATOMIC MEMORY OPERATIONS

First Claim

1 Assignment

0 Petitions

Accused Products

Abstract

9 Citations

20 Claims

Specification

Solutions

Use Cases

Quick Links

INSTRUCTIONS FOR DUAL DESTINATION TYPE CONVERSION, MIXED PRECISION ACCUMULATION, AND MIXED PRECISION ATOMIC MEMORY OPERATIONS

First Claim

1 Assignment

Subscription Required

Subscription Required

0 Petitions

Subscription Required

Accused Products

Subscription Required

Abstract

9 Citations

20 Claims

Specification

Subscription Required

Solutions

Use Cases

Quick Links