Method and apparatus for profile-based reordering of program portions in a computer program
First Claim
1. An apparatus comprising:
- at least one processor;
a memory coupled to the at least one processor;
a computer program residing in the memory comprising a plurality of portions joined by a plurality of program paths;
profile data residing in the memory representing the execution frequency of at least one of the plurality of program paths within the computer program;
a reordering mechanism residing in the memory and executed by the at least one processor for determining an order for the plurality of portions of the computer program by constructing a plurality of traces, each trace comprising a possible execution path along the plurality of program paths through at least one of the plurality of portions, the reordering mechanism placing the at least one portion in the trace in an order determined by (1) the profile data and (2) at least one reordering method that does not necessarily add to the trace any predecessor portion to the at least one portion in the trace when at least one predecessor portion exists, and that does not necessarily add to the trace any successor portion to the at least one portion in the trace when at least one successor portion exists.
1 Assignment
0 Petitions
Accused Products
Abstract
An apparatus and several methods provide for a more optimized computer program that will have a faster execution time than was possible using the prior art reordering technique that adds to a trace until it finds no more predecessors or successors to add. The apparatus and methods disclosed herein use a variety of methods to reorder the program portions in a more intelligent manner that will improve its run-time performance. Each of these methods involves constructing traces in the control flow graph of the computer program. In a first embodiment, a basic block is only added to a trace if it is not negligible within predetermined limits. This negligibility test results in traces that are not extended for infrequently executed basic blocks. In a second embodiment, a basic block is only added to a trace if it is a perfect partner with the last basic block added to the trace. The concept of a "perfect partner" helps to match basic blocks together in a trace that have the greatest affinity for each other. In a third embodiment, a basic block is only added to a trace if it satisfies "should follow" flags and predetermined negligibility criteria. The "should follow" flags and negligibility criteria provide a complex criteria for adding basic blocks to a trace, criteria that create more efficient code in some specific circumstances.
169 Citations
26 Claims
-
1. An apparatus comprising:
-
at least one processor; a memory coupled to the at least one processor; a computer program residing in the memory comprising a plurality of portions joined by a plurality of program paths; profile data residing in the memory representing the execution frequency of at least one of the plurality of program paths within the computer program; a reordering mechanism residing in the memory and executed by the at least one processor for determining an order for the plurality of portions of the computer program by constructing a plurality of traces, each trace comprising a possible execution path along the plurality of program paths through at least one of the plurality of portions, the reordering mechanism placing the at least one portion in the trace in an order determined by (1) the profile data and (2) at least one reordering method that does not necessarily add to the trace any predecessor portion to the at least one portion in the trace when at least one predecessor portion exists, and that does not necessarily add to the trace any successor portion to the at least one portion in the trace when at least one successor portion exists. - View Dependent Claims (2, 3)
-
-
4. A method for improving execution speed of a computer program on a computer apparatus by reordering a plurality of program portions in a computer program, the method comprising the step of:
constructing a plurality of traces, each trace comprising at least one of the plurality of portions, the step of constructing each trace comprising the steps of; adding a portion to the trace in an order determined by; (1) profile data; and (2) at least one reordering method that does not necessarily add to the trace any predecessor portion to the at least one portion in the trace when at least one predecessor portion exists, and that does not necessarily add to the trace any successor portion to the at least one portion in the trace when at least one successor portion exists; ordering the plurality of traces according to their order of creation; and reordering the plurality of portions of the computer program from the order of the plurality of traces and the order of the plurality of portions within the plurality of traces. - View Dependent Claims (5, 6, 7, 8, 9, 10, 11, 12)
-
13. A program product comprising:
-
(A) an optimizer, the optimizer reordering a plurality of program portions in a computer program according to profile data, the optimizer comprising; a reordering mechanism for determining an order for the plurality of portions of the computer program by constructing a plurality of traces, each trace comprising a possible execution path along a plurality of program paths through at least one of the plurality of portions, the reordering mechanism placing the at least one portion in the trace in an order determined by (1) the profile data and (2) at least one reordering method that does not necessarily add to the trace any predecessor portion to the at least one portion in the trace when at least one predecessor portion exists, and that does not necessarily add to the trace any successor portion to the at least one portion in the trace when at least one successor portion exists; and (B) signal bearing media bearing the optimizer. - View Dependent Claims (14, 15, 16)
-
-
17. An apparatus comprising:
-
at least one processor; a memory coupled to the at least one processor; a computer program residing in the memory comprising a plurality of basic blocks joined by a plurality of program paths; profile data residing in the memory representing the execution frequency of at least one of the plurality of program paths within the computer program; a reordering mechanism residing in the memory and executed by the at least one processor for determining an order for the plurality of basic blocks of the computer program by constructing a plurality of traces, each trace comprising a possible execution path along the plurality of program paths through at least one of the plurality of basic blocks, the reordering mechanism placing the at least one basic block in the trace in an order determined by (1) the profile data and (2) at least one reordering method that does not necessarily add to the trace any predecessor basic block to the at least one basic block in the trace when at least one predecessor basic block exists, and that does not necessarily add to the trace any successor basic block to the at least one basic block in the trace when at least one successor basic block exists; the reordering mechanism ordering the plurality of traces according to their order of creation; the reordering mechanism determining the order for the plurality of basic blocks of the computer program from the order of the plurality of traces and the order of the plurality of basic blocks within the plurality of traces.
-
-
18. A method for improving execution speed of a computer program on a computer apparatus by reordering a plurality of basic blocks in a computer program, the method comprising the step of:
constructing a plurality of traces, each trace comprising at least one of the plurality of basic blocks, the step of constructing each trace comprising the steps of; (A) adding a basic block to the trace in an order determined by; (1) profile data; and (2) at least one reordering method that does not necessarily add to the trace any predecessor basic block to the basic block in the trace when at least one predecessor basic block exists, and that does not necessarily add to the trace any successor basic block to the basic block in the trace when at least one successor basic block exists; (B) ordering the plurality of traces according to their order of creation; (C) reordering the plurality of basic blocks of the computer program from the order of the plurality of traces and the order of the plurality of basic blocks within the plurality of traces. - View Dependent Claims (19, 20, 21, 22, 23, 24, 25, 26)
Specification