System, method, and computer program product for bulk synchronous binary program translation and optimization
First Claim
Patent Images
1. A method comprising:
- executing, on a parallel processor, a block of translated binary instructions by multiple threads;
gathering profiling data during execution of the block of translated binary instructions;
synchronizing the multiple threads at a barrier instruction associated with the block of translated binary instructions, wherein the barrier instruction specifies a barrier hierarchy level;
replacing the block of translated binary instructions with optimized binary instructions, wherein the optimized binary instructions are produced based on the profiling data;
determining a lower level barrier than the specified barrier hierarchy level is supported; and
comparing the optimized binary instructions with one or more versions of binary instructions for the block that are associated with different multiple threads.
1 Assignment
0 Petitions
Accused Products
Abstract
A system, method, and computer program product are provided for. The method includes the steps of executing a block of translated binary instructions by multiple threads and gathering profiling data during execution of the block of translated binary instructions. The multiple threads are then synchronized at a barrier instruction associated with the block of translated binary instructions and the block of translated binary instructions is replaced with optimized binary instructions, where the optimized binary instructions are produced based on the profiling data.
8 Citations
16 Claims
-
1. A method comprising:
-
executing, on a parallel processor, a block of translated binary instructions by multiple threads; gathering profiling data during execution of the block of translated binary instructions; synchronizing the multiple threads at a barrier instruction associated with the block of translated binary instructions, wherein the barrier instruction specifies a barrier hierarchy level; replacing the block of translated binary instructions with optimized binary instructions, wherein the optimized binary instructions are produced based on the profiling data; determining a lower level barrier than the specified barrier hierarchy level is supported; and comparing the optimized binary instructions with one or more versions of binary instructions for the block that are associated with different multiple threads. - View Dependent Claims (2, 3, 4, 5, 6, 7, 8, 9, 10)
-
-
11. A non-transitory computer-readable storage medium storing instructions that, when executed by a processor, cause the processor to perform steps comprising
executing a block of translated binary instructions by multiple threads; -
gathering profiling data during execution of the block of translated binary instructions; synchronizing the multiple threads at a barrier instruction associated with the block of translated binary instructions, wherein the barrier instruction specifies a barrier hierarchy level; replacing the block of translated binary instructions with optimized binary instructions, wherein the optimized binary instructions are produced based on the profiling data; determining a lower level barrier than the specified barrier hierarchy level is supported; and comparing the optimized binary instructions with one or more versions of binary instructions for the block that are associated with different multiple threads.
-
-
12. A system comprising:
-
a memory configured to store a block of translated binary instructions; and a plurality of multithreaded processing units that are included within a parallel processor and are coupled to the memory and configured to; execute the block of translated binary instructions by multiple threads; gather profiling data during execution of the block of translated binary instructions; synchronize the multiple threads at a barrier instruction associated with the block of translated binary instructions, wherein the barrier instruction specifies a barrier hierarchy level; replace the block of translated binary instructions with optimized binary instructions, wherein the optimized binary instructions are produced based on the profiling data; determine a lower level barrier than the specified barrier hierarchy level is supported; and compare the optimized binary instructions with one or more versions of binary instructions for the block that are associated with different multiple threads. - View Dependent Claims (13, 14, 15, 16)
-
Specification