Circular scheduling method and apparatus for executing computer programs by moving independent instructions out of a loop
First Claim
1. A computer implemented method for improving loop execution performance in executing a computer program, said loop comprising a first plurality of computer instructions and salad computer program including instructions for sequentially executing a number of iterations of said loop wherein said loop has a first iteration, the method comprising:
- (a) identifying, among said loop instructions, at least one independent instruction which does not require previous execution of another loop instruction in the same iteration;
(b) moving a first iteration of said at least one independent instruction to a location prior to said loop;
(c) moving second and subsequent iterations of said at least one independent instruction to a preceding iteration;
(d) moving all unmoved instructions in the last iteration of said loop to a location after said loop;
(e) reducing the number of iterations in the loop by one;
wherein performing steps (a) through (e) results in providing at least one of said first plurality of instructions as an independent instruction; and
(f) moving said at least one of said first plurality of instructions into a delay slot produced by another loop instruction.
8 Assignments
0 Petitions
Accused Products
Abstract
A procedure which is a particular type of software pipelining is provided which increases the efficiency with which code is executed by reducing or eliminating stalls such as by filling delay slots. The process includes moving instructions in a loop from one loop iteration to another. The moving of instructions provides the scheduler with additional independent instructions in a given basic block so the scheduler has greater freedom to move instructions into unfilled delay slots. The procedure includes changing the entry point into the loop, thus effectively moving an instruction from near the top of the loop to near the bottom of the loop, while changing the iteration number of the moved instruction.
53 Citations
15 Claims
-
1. A computer implemented method for improving loop execution performance in executing a computer program, said loop comprising a first plurality of computer instructions and salad computer program including instructions for sequentially executing a number of iterations of said loop wherein said loop has a first iteration, the method comprising:
-
(a) identifying, among said loop instructions, at least one independent instruction which does not require previous execution of another loop instruction in the same iteration; (b) moving a first iteration of said at least one independent instruction to a location prior to said loop; (c) moving second and subsequent iterations of said at least one independent instruction to a preceding iteration; (d) moving all unmoved instructions in the last iteration of said loop to a location after said loop; (e) reducing the number of iterations in the loop by one; wherein performing steps (a) through (e) results in providing at least one of said first plurality of instructions as an independent instruction; and (f) moving said at least one of said first plurality of instructions into a delay slot produced by another loop instruction. - View Dependent Claims (2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12)
-
-
13. In a computer, apparatus for improving loop execution performance in executing a computer program, said loop comprising a first plurality of computer instructions and said computer program including instructions for sequentially executing a number of iterations of said loop wherein said loop has a first iteration, the apparatus comprising:
-
(a) means for identifying, among said loop instructions, at least one independent instruction which does not require previous execution of another loop instruction in the same iteration; (b) means coupled the said means for identifying, for moving a first iteration of said at least one independent instruction to a location prior to said loop; (c) means, coupled to said means for identifying, for moving second and subsequent iterations of said at least one independent instruction to a preceding iteration; (d) means, coupled to said means for identifying, for moving all unmoved instructions in the last iteration of said loop to a location after said loop; (e) means, coupled to said means for identifying, for reducing the number of iterations in the loop by one; wherein means (a) through (e) provide at least one of said first plurality of instructions as an independent instruction; (f) means, coupled to said means for identifying, for moving said at least one of said first plurality of instructions into a delay slot produced by another loop instruction. - View Dependent Claims (14)
-
-
15. A computer-implemented method for use in a compiler program, the compiler program being a program for converting source code of a computer program into executable object code, the executable object code including a plurality of instructions executable by a computer, the computer having a plurality of allocatable registers, the compiler including a scheduler for receiving unscheduled object code and establishing the order of execution of executable object code instructions, the method being for use in increasing the efficiency of execution of the executable object code by decreasing the number of unfilled delay slots in the executable object cede, the method comprising:
-
(a) allocating said allocatable registers for use during execution of the executable object code; (b) renaming at least one of said allocatable registers such that first and second tasks that would otherwise use one register for two sequential tasks will instead use two registers, thus eliminating a dependency between the tasks; (c) submitting a first portion of said unscheduled object code, which defines a programming loop having a plurality of iterations, to the scheduler, to produce a first scheduled loop code, having a plurality of instructions, said first scheduled loop code having a loop beginning and a loop end; (d) determining if there are unfilled delay slots in said first scheduled loop code; (e) when there are no unfilled delay slots in said first scheduled loop code, using said first scheduled loop code as part of the executable object ccode; (f) when there are unfilled delay slots in said first scheduled loop code, selecting a first, independent instruction from among the instructions in the first scheduled loop code, an independent instruction being an instruction which does not require a previous instruction in the loop to be executed first; (g) moving a first iteration of the selected first independent instruction to a position before said loop beginning and moving each subsequent iteration of the selected first independent instruction a position in an immediately previous iteration and moving the last iteration of all instructions except for said selected first independent instruction to a position after said loop end, to produce a first modified object code; (h) submitting said first modified object code to the scheduler to produce a second scheduled loop code; (i) determining if there are unfilled delay slots in said second scheduled loop code; (j) when here are no unfilled delay slots in said second scheduled loop code, using said second scheduled loop code as part of the executable object code; (k) when there are unfilled delay slots, and if there are previously unmoved independent instructions in the second scheduled loop code, selecting a second independent instruction which is previously unmoved; (1) moving the first iteration of the selected second independent instruction to a position before said loop beginning and moving each subsequent iteration of the selected second independent instruction to a position in the immediately previous iteration to produce a second modified object code; (m) submitting the second modified object code to the scheduler to produce a third scheduled loop code; (n) selecting one among a plurality of scheduled loop codes as a preferred scheduled loop code, said plurality of scheduled loop s including at least said first, second and third scheduled loop codes, said preferred scheduled loop code having no more unfilled delay slots than any of the other of said plurality of scheduled loop codes, and using said preferred scheduled loop code as part of the executable object code.
-
Specification