Insertion of multithreaded execution synchronization points in a software program
First Claim
1. A method for inserting synchronization points in a software program, comprising:
- identifying a post-dominate first node of a control flow graph that represents the software program, wherein every path from a second node of the control flow graph passes through the post-dominate first node, wherein the post-dominate first node is a post-tail node of a loop and the second node is a pre-header node of the loop;
inserting a push synchronization instruction into the second node to push a synchronization token onto a stack during execution of the software program;
inserting a pop synchronization instruction into the post-dominate first node to create a first synchronization point in the software program where divergent execution threads in a thread group wait for one or more clock cycles, wherein the pop synchronization instruction causes the synchronization token to be popped from the stack during execution of the software program after all execution threads in the thread group have reached the post-dominate first node and are synchronized;
inserting a push synchronization instruction into a third node within the loop that includes a conditional branch;
inserting a pop synchronization instruction into a fourth node within the loop that merges the conditional branch to create a second synchronization point for the conditional branch;
identifying a run away path that does not pass through the second synchronization point and connects a fifth node to a sixth node that are each within the loop;
inserting a push break instruction that specifies an address of the sixth node into a seventh node that is a loop head of the loop;
replacing a branch that corresponds to an edge into the sixth node with a break to the sixth node; and
inserting a pop synchronization instruction in the sixth node to create a third synchronization point in the software program that synchronizes any threads taking the run away path with other threads in the thread group.
1 Assignment
0 Petitions
Accused Products
Abstract
A compiler is configured to determine a set of points in a flow graph for a software program where multithreaded execution synchronization points are inserted to synchronize divergent threads for SIMD processing. MIMD execution of divergent threads is allowed and execution of the divergent threads proceeds until a synchronization point is reached. When all of the threads reach the synchronization point, synchronous execution resumes. The synchronization points are needed to ensure proper execution of the certain instructions that require synchronous execution as defined in some graphics APIs and when synchronous execution improves performance based on a SIMD architecture.
55 Citations
10 Claims
-
1. A method for inserting synchronization points in a software program, comprising:
-
identifying a post-dominate first node of a control flow graph that represents the software program, wherein every path from a second node of the control flow graph passes through the post-dominate first node, wherein the post-dominate first node is a post-tail node of a loop and the second node is a pre-header node of the loop; inserting a push synchronization instruction into the second node to push a synchronization token onto a stack during execution of the software program; inserting a pop synchronization instruction into the post-dominate first node to create a first synchronization point in the software program where divergent execution threads in a thread group wait for one or more clock cycles, wherein the pop synchronization instruction causes the synchronization token to be popped from the stack during execution of the software program after all execution threads in the thread group have reached the post-dominate first node and are synchronized; inserting a push synchronization instruction into a third node within the loop that includes a conditional branch; inserting a pop synchronization instruction into a fourth node within the loop that merges the conditional branch to create a second synchronization point for the conditional branch; identifying a run away path that does not pass through the second synchronization point and connects a fifth node to a sixth node that are each within the loop; inserting a push break instruction that specifies an address of the sixth node into a seventh node that is a loop head of the loop; replacing a branch that corresponds to an edge into the sixth node with a break to the sixth node; and inserting a pop synchronization instruction in the sixth node to create a third synchronization point in the software program that synchronizes any threads taking the run away path with other threads in the thread group.
-
-
2. A method for inserting synchronization points in a software program, comprising:
-
identifying a first node of a control flow graph that represents the software program, wherein the first node is a post-tail node of a loop that post-dominates a second node that is a dominate node of the control flow graph and a pre-header node of the loop; inserting a push synchronization instruction into the second node to push a synchronization token onto a stack during execution of the software program; inserting a pop synchronization instruction into the first node to create a first synchronization point in the software program that causes the synchronization token to be popped from the stack during execution of the software program when execution threads in a thread group are synchronized; creating a second synchronization point within the loop; identifying an additional exit path from a third node that jumps out of the loop and does not pass through the second synchronization point; creating a pre-tail node that provides a path to a loop tail node of the loop and intercepts all paths input to the loop tail node; creating a temp node between the third node and the pre-tail node to intercept the additional exit path; assigning a temporary variable to a first value in the temp node; initializing the temporary variable to a second value in the second node; and inserting a conditional branch in the pre-tail block that is based on the temporary variable to provide a path to the first node. - View Dependent Claims (3)
-
-
4. A method for inserting synchronization points in a software program, comprising:
-
identifying a post-dominate first node of a control flow graph that represents the software program, wherein every path from a second node of the control flow graph passes through the post-dominate first node; inserting a push synchronization instruction into the second node to push a synchronization token onto a stack during execution of the software program; inserting a pop synchronization instruction into the post-dominate first node to create a first synchronization point in the software program where divergent execution threads in a thread group wait for one or more clock cycles, wherein the pop synchronization instruction causes the synchronization token to be popped from the stack during execution of the software program after all execution threads in the thread group have reached the post-dominate first node and are synchronized; identifying a run away path that does not pass through the first synchronization point and connects a third node to a fourth node that are each dominated by the second node; inserting a push break instruction that specifies an address of the fourth node into a fifth node that is a pre-header of the second node; replacing a branch that corresponds to an edge into the fourth node with a break to the fourth node; and inserting a pop synchronization instruction in the fourth node to create a second synchronization point in the software program that synchronizes any threads taking the run away path with other threads in the thread group.
-
-
5. A non-transitory computer readable medium storing instructions for causing a processor to insert synchronization points in a software program by performing the steps of:
-
identifying a post-dominate first node of a control flow graph that represents the software program, wherein every path from a second node of the control flow graph passes through the post-dominate first node, wherein the post-dominate first node is a post-tail node of a loop and the second node is a pre-header node of the loop; inserting a push synchronization instruction into the second node to push a synchronization token onto a stack during execution of the software program; inserting a pop synchronization instruction into the post-dominate first node to create a first synchronization point in the software program where divergent execution threads in a thread group wait for one or more clock cycles, wherein the pop synchronization instruction causes the synchronization token to be popped from the stack during execution of the software program after all execution threads in the thread group have reached the post-dominate first node and are synchronized; inserting a push synchronization instruction into a third node within the loop that includes a conditional branch; inserting a pop synchronization instruction into a fourth node within the loop that merges the conditional branch to create a second synchronization point for the conditional branch; identifying a run away path that does not pass through the second synchronization point and connects a fifth node to a sixth node that are each within the loop; inserting a push break instruction that specifies an address of the sixth node into a seventh node that is a loop head of the loop; replacing a branch that corresponds to an edge into the sixth node with a break to the sixth node; and inserting a pop synchronization instruction in the sixth node to create a third synchronization point in the software program that synchronizes any threads taking the run away path with other threads in the thread group.
-
-
6. A non-transitory computer readable medium storing instructions for causing a processor to insert synchronization points in a software program by performing the steps of:
-
identifying a first node of a control flow graph that represents the software program, wherein the first node is a post-tail node of a loop that post-dominates a second node that is a dominate node of the control flow graph and a pre-header node of the loop; inserting a push synchronization instruction into the second node to push a synchronization token onto a stack during execution of the software program; inserting a pop synchronization instruction into the first node to create a first synchronization point in the software program that causes the synchronization token to be popped from the stack during execution of the software program when execution threads in a thread group are synchronized; creating a second synchronization point within the loop; identifying an additional exit path from a third node that jumps out of the loop and does not pass through the second synchronization point; creating a pre-tail node that provides a path to a loop tail node of the loop and intercepts all paths input to the loop tail node; creating a temp node between the third node and the pre-tail node to intercept the additional exit path; assigning a temporary variable to a first value in the temp node; initializing the temporary variable to a second value in the second node; and inserting a conditional branch in the pre-tail block that is based on the temporary variable to provide a path to the first node. - View Dependent Claims (7)
-
-
8. A non-transitory computer readable medium storing instructions for causing a processor to insert synchronization points in a software program by performing the steps of:
-
identifying a post-dominate first node of a control flow graph that represents the software program, wherein every path from a second node of the control flow graph passes through the post-dominate first node; inserting a push synchronization instruction into the second node to push a synchronization token onto a stack during execution of the software program; inserting a pop synchronization instruction into the post-dominate first node to create a first synchronization point in the software program where divergent execution threads in a thread group wait for one or more clock cycles, wherein the pop synchronization instruction causes the synchronization token to be popped from the stack during execution of the software program after all execution threads in the thread group have reached the post-dominate first node and are synchronized; identifying a run away path that does not pass through the first synchronization point and connects a third node to a fourth node that are each dominated by the second node; inserting a push break instruction that specifies an address of the fourth node into a fifth node that is a pre-header of the second node; replacing a branch that corresponds to an edge into the fourth node with a break to the fourth node; and inserting a pop synchronization instruction in the fourth node to create a second synchronization point in the software program that synchronizes any threads taking the run away path with other threads in the thread group.
-
-
9. A computing system for inserting synchronization points in a software program, comprising:
-
a compiler program configured to; identify a first node of a control flow graph that represents the software program, wherein the first node is a post-tail node of a loop that post-dominates a second node that is a dominate node of the control flow graph and a pre-header node of the loop; insert a push synchronization instruction into the second node to push a synchronization token onto a stack during execution of the software program; insert a pop synchronization instruction into the first node to create a first synchronization point in the software program that causes the synchronization token to be popped from the stack during execution of the software program when execution threads in a thread group are synchronized; identify an additional exit path from a third node that jumps out of the loop and does not pass through the second synchronization point; create a pre-tail node that provides a path to a loop tail node of the loop and intercepts all paths input to the loop tail node; create a temp node between the third node and the pre-tail node to intercept the additional exit path; assign a temporary variable to a first value in the temp node; initialize the temporary variable to a second value in the second node; and insert a conditional branch in the pre-tail block that is based on the temporary variable to provide a path to the first node that bypasses the loop tail block; and a memory configured to store the compiler program and the software program; and a processor coupled to the memory and configured to execute the compiler program. - View Dependent Claims (10)
-
Specification