Pipelined parallelization with localized self-helper threading
First Claim
1. A non-transitory computer readable storage medium storing program instructions, wherein the program instructions are executable to:
- identify a parallel region with cross-iteration dependences within program instructions of a computer program;
parallelize the parallel region into a plurality of threads, wherein each of said threads comprises a same set of algorithmic instructions;
generate helper thread instructions based at least in part on the algorithmic instructions;
insert the helper thread instructions in each of the plurality of threads;
insert synchronization instructions in one or more of the plurality of threads, wherein the synchronization instructions are configured to cause execution of the algorithmic instructions of each of the plurality of threads to occur in program order with respect to each other thread of the plurality of threads;
determine that a first instruction of the helper thread instructions is dependent upon a second instruction of the helper thread instructions; and
set a next output of the second instruction to have a same value as that of a previous output of the second instruction in response to predicting that the second instruction will not be executed.
2 Assignments
0 Petitions
Accused Products
Abstract
A system and method for automatically parallelizing a computer program for multi-threaded execution. A compiler identifies and parallelizes non-DOALL parallel regions, such as loops, within a computer program. The compiler determines enhanced helper thread instructions based upon the main body instructions of the non-DOALL region. These helper thread instructions are inserted ahead of the main body instructions within each of the plurality of threads, rather than within a single main thread. Next, synchronization instructions are inserted in one or more threads such that the main body of work of each thread is performed in a pipelined manner. The helper thread instructions within each thread may reduce the total execution time of each thread.
-
Citations
20 Claims
-
1. A non-transitory computer readable storage medium storing program instructions, wherein the program instructions are executable to:
-
identify a parallel region with cross-iteration dependences within program instructions of a computer program; parallelize the parallel region into a plurality of threads, wherein each of said threads comprises a same set of algorithmic instructions; generate helper thread instructions based at least in part on the algorithmic instructions; insert the helper thread instructions in each of the plurality of threads; insert synchronization instructions in one or more of the plurality of threads, wherein the synchronization instructions are configured to cause execution of the algorithmic instructions of each of the plurality of threads to occur in program order with respect to each other thread of the plurality of threads; determine that a first instruction of the helper thread instructions is dependent upon a second instruction of the helper thread instructions; and set a next output of the second instruction to have a same value as that of a previous output of the second instruction in response to predicting that the second instruction will not be executed. - View Dependent Claims (2, 3, 4, 5, 6, 7, 8)
-
-
9. A compilation method for automatically parallelizing a computer program executable by a processor comprising:
-
identifying a parallel region with cross-iteration dependences within program instructions of a computer program; parallelizing the parallel region into a plurality of threads, wherein each thread comprises a same set of algorithmic instructions; generating helper thread instructions based at least in part on the algorithmic instructions; inserting the helper thread instructions in each of the plurality of threads; and inserting synchronization instructions in one or more of the plurality of threads, wherein the synchronization instructions are configured to cause execution of the algorithmic instructions of each of the plurality of threads to occur in program order with respect to each other thread of the plurality of threads; determining that a first instruction of the helper thread instructions is dependent upon a second instruction of the helper thread instructions; and setting a next output of the second instruction to have a same value as that of a previous output of the second instruction in response to predicting that the second instruction will not be executed. - View Dependent Claims (10, 11, 12, 13, 14, 15, 16)
-
-
17. A compiler comprising:
-
a processor;
a memory coupled to the processor;code generation instructions stored in the memory and executed by the processor; and optimization instructions stored in the memory and executed by the processor, the optimization instructions configured to; identify a parallel region with cross-iteration dependences within program instructions of a computer program; parallelize the parallel region into a plurality of threads, wherein each thread comprises a same set of algorithmic instructions; and determine helper thread instructions based upon the algorithmic instructions; and wherein the code generation instructions are configured to; insert the helper thread instructions in each of the plurality of threads; and insert synchronization instructions in one or more of the plurality of threads, wherein the synchronization instructions are configured to cause execution of the algorithmic instructions of each of the plurality of threads to occur in program order with respect to each other thread of the plurality of threads; and wherein the optimization instructions are further configured to; determine that a first instruction of the helper thread instructions is dependent upon a second instruction of the helper thread instructions; and to set a next output of said second instruction to have a same value as that of a previous output of the second instruction in response to predicting that the second instruction will not be executed. - View Dependent Claims (18, 19, 20)
-
Specification