PARALLELIZING NON-COUNTABLE LOOPS WITH HARDWARE TRANSACTIONAL MEMORY
First Claim
1. A method for parallelizing program code of an application, the method comprising:
- examining one or more program instructions of a multi-threaded application;
identifying a non-countable loop pattern;
replacing the non-countable loop pattern with a parallelized loop pattern, wherein the parallelized loop pattern is configured to squash and re-execute any speculative thread of the parallelized loop pattern that is signaled to have a transaction failure.
1 Assignment
0 Petitions
Accused Products
Abstract
A system and method for speculatively parallelizing non-countable loops in a multi-threaded application. A multi-core processor receives instructions for a multi-threaded application. The application may contain non-countable loops. Non-countable loops have an iteration count value that cannot be determined prior to the execution of the non-countable loop, a loop index value that cannot be non-speculatively determined prior to the execution of an iteration of the non-countable loop, and control that is not transferred out of the loop body by a code line in the loop body. The compiler replaces the non-countable loop with a parallelized loop pattern that uses outlined function calls defined in a parallelization library (PL) in order to speculatively execute iterations of the parallelized loop. The parallelized loop pattern is configured to squash and re-execute any speculative thread of the parallelized loop pattern that is signaled to have a transaction failure.
-
Citations
20 Claims
-
1. A method for parallelizing program code of an application, the method comprising:
-
examining one or more program instructions of a multi-threaded application; identifying a non-countable loop pattern; replacing the non-countable loop pattern with a parallelized loop pattern, wherein the parallelized loop pattern is configured to squash and re-execute any speculative thread of the parallelized loop pattern that is signaled to have a transaction failure. - View Dependent Claims (2, 3, 4, 5, 6, 7, 8, 9)
-
-
10. A compiler comprising:
-
a processor core selection unit configured to assign software threads to waiting hardware threads; an optimizer; and a code generator; wherein the optimizer is configured to; examine one or more program instructions of a multi-threaded application; identify a non-countable loop pattern; replace the non-countable loop pattern with a parallelized loop pattern, wherein the parallelized loop pattern is configured to squash and re-execute any speculative thread of the parallelized loop pattern that is signaled to have a transaction failure. - View Dependent Claims (11, 12, 13, 14, 15, 16, 17)
-
-
18. A computer readable storage medium storing program instructions operable to parallelize program code of an application, wherein the program instructions are executable to:
-
examine one or more program instructions of a multi-threaded application; identify a non-countable loop pattern; replace the non-countable loop pattern with a parallelized loop pattern, wherein the parallelized loop pattern is configured to squash and re-execute any speculative thread of the parallelized loop pattern that is signaled to have a transaction failure. - View Dependent Claims (19, 20)
-
Specification