PIPELINED PARALLELIZATION WITH LOCALIZED SELF-HELPER THREADING
First Claim
1. A computer readable storage medium storing program instructions, wherein the program instructions are executable to:
- identify a parallel region with cross-iteration dependences within program instructions of a computer program;
parallelize the parallel region into a plurality of threads, wherein each of said threads comprises a same set of algorithmic instructions;
generate helper thread instructions based at least in part on the algorithmic instructions;
insert the helper thread instructions in each of the plurality of threads; and
insert synchronization instructions in one or more of the plurality of threads, wherein the synchronization instructions are configured to cause execution of the algorithmic instructions of each of the plurality of threads to occur in program order with respect to each other thread of the plurality of threads.
2 Assignments
0 Petitions
Accused Products
Abstract
A system and method for automatically parallelizing a computer program for multi-threaded execution. A compiler identifies and parallelizes non-DOALL parallel regions, such as loops, within a computer program. The compiler determines enhanced helper thread instructions based upon the main body instructions of the non-DOALL region. These helper thread instructions are inserted ahead of the main body instructions within each of the plurality of threads, rather than within a single main thread. Next, synchronization instructions are inserted in one or more threads such that the main body of work of each thread is performed in a pipelined manner. The helper thread instructions within each thread may reduce the total execution time of each thread.
54 Citations
20 Claims
-
1. A computer readable storage medium storing program instructions, wherein the program instructions are executable to:
-
identify a parallel region with cross-iteration dependences within program instructions of a computer program; parallelize the parallel region into a plurality of threads, wherein each of said threads comprises a same set of algorithmic instructions; generate helper thread instructions based at least in part on the algorithmic instructions; insert the helper thread instructions in each of the plurality of threads; and insert synchronization instructions in one or more of the plurality of threads, wherein the synchronization instructions are configured to cause execution of the algorithmic instructions of each of the plurality of threads to occur in program order with respect to each other thread of the plurality of threads. - View Dependent Claims (2, 3, 4, 5, 6, 7, 8)
-
-
9. A method comprising:
-
identifying a parallel region with cross-iteration dependences within program instructions of a computer program; parallelizing the parallel region into a plurality of threads, wherein each thread comprises a same set of algorithmic instructions; generating helper thread instructions based at least in part on the algorithmic instructions; inserting the helper thread instructions in each of the plurality of threads; and inserting synchronization instructions in one or more of the plurality of threads, wherein the synchronization instructions are configured to cause execution of the algorithmic instructions of each of the plurality of threads to occur in program order with respect to each other thread of the plurality of threads. - View Dependent Claims (10, 11, 12, 13, 14, 15, 16)
-
-
17. A compiler comprising:
-
code generation instructions; and optimization instructions configured to; identify a parallel region with cross-iteration dependences within program instructions of a computer program; parallelize the parallel region into a plurality of threads, wherein each thread comprises a same set of algorithmic instructions; and determine helper thread instructions based upon the algorithmic instructions; and wherein the code generation instructions are configured to; insert the helper thread instructions in each of the plurality of threads; and insert synchronization instructions in one or more of the plurality of threads, wherein the synchronization instructions are configured to cause execution of the algorithmic instructions of each of the plurality of threads to occur in program order with respect to each other thread of the plurality of threads. - View Dependent Claims (18, 19, 20)
-
Specification