Prefetch for systems with heterogeneous architectures
First Claim
1. A method comprising:
- generating in an intermediate code representation a prefetch instruction and a launch instruction corresponding to an instruction, in a source program, that indicates an operation to be performed on a second processor; and
performing one or more compiler optimizations on the intermediate code representation to generate a binary file, the binary file including first machine instructions of the target processor for the prefetch instruction and the launch instruction and at least one other instruction, as well including one or more second machine instructions of the second processor to be executed by the second processor responsive to the target processor'"'"'s execution of the launch instruction,the binary file further being structured so that the at least one other instruction is to be executed on the target processor while the second processor executes the second machine instructions.
1 Assignment
0 Petitions
Accused Products
Abstract
A compiler for a heterogeneous system that includes both one or more primary processors and one or more parallel co-processors is presented. For at least one embodiment, the primary processors(s) include a CPU and the parallel co-processor(s) include a GPU. Source code for the heterogeneous system may include code to be performed on the CPU but also code segments, referred to as “foreign macro-instructions”, that are to be performed on the GPU. An optimizing compiler for the heterogeneous system comprehends the architecture of both processors, and generates an optimized fat binary that includes machine code instructions for both the primary processor(s) and the co-processor(s). The optimizing compiler compiles the foreign macro-instructions as if they were predefined functions of the CPU, rather than as remote procedure calls. The binary is the result of compiler optimization techniques, and includes prefetch instructions to load code and/or data into the GPU memory concurrently with execution of other instructions on the CPU. Other embodiments are described and claimed.
104 Citations
26 Claims
-
1. A method comprising:
-
generating in an intermediate code representation a prefetch instruction and a launch instruction corresponding to an instruction, in a source program, that indicates an operation to be performed on a second processor; and performing one or more compiler optimizations on the intermediate code representation to generate a binary file, the binary file including first machine instructions of the target processor for the prefetch instruction and the launch instruction and at least one other instruction, as well including one or more second machine instructions of the second processor to be executed by the second processor responsive to the target processor'"'"'s execution of the launch instruction, the binary file further being structured so that the at least one other instruction is to be executed on the target processor while the second processor executes the second machine instructions. - View Dependent Claims (2, 3, 4, 5, 6, 7, 8)
-
-
9. A system comprising:
-
a die package that includes a first processor and a second processor, said first and second processors being heterogeneous with respect to each other; a first memory coupled to said first processor and a second memory coupled to said second processor; a library to facilitate transport of instructions and data, related to a set of source instructions, between the first processor and the second memory, wherein said second memory is not shared by said first processor; said first and second processors to execute a single executable code image that has been compiled by an optimizing compiler such that the executable image includes one or more calls to the library to trigger transport of data for the set of source instructions to the second processor while the first processor concurrently executes one or more other instructions. - View Dependent Claims (10, 11, 12, 13, 14, 15, 16, 17, 18)
-
-
19. An article comprising a machine-accessible medium including instructions that when executed cause a system to:
-
generate in an intermediate code representation a prefetch instruction and a launch instruction corresponding to an instruction, in a source program, that indicates one or more instructions to be performed on a second processor; wherein said launch instruction is to be executed as a predefined function of a target processor rather than as a remote procedure call; and perform one or more compiler optimizations on the intermediate code representation to generate a binary file, the binary file including first machine instructions of the target processor for the prefetch instruction and the launch instruction and at least one other instruction, as well including one or more second machine instructions of the second processor to be executed by the second processor responsive to the target processor'"'"'s execution of the launch instruction, the binary file further being structured so that the at least one other instruction is to be executed on the target processor concurrent with the second processor'"'"'s execution of the second machine instructions. - View Dependent Claims (20, 21, 22, 23, 24, 25, 26)
-
Specification