Dynamic load balancing among processors in a parallel computer
First Claim
1. A method for dynamically balancing a processing workload among parallel processors operating on a program, where the program comprises recursive function calls capable of being executed in parallel, the method comprising:
- compiling both parallel and serial versions of the program, the parallel version being capable of executing in a data-parallel fashion on every processor in a team of processors and the serial version being capable of executing on a single processor;
inserting a test function in a serial version of the program;
upon recursive function calls of the program, subdividing each team of processors into smaller teams, with one team per recursive function call;
when the subdividing step results in a team having only one processor, calling the serial version of the code from the parallel version of the code executing on that processor;
executing the test function on a first processor to determine whether computational cost of a recursive function call to be executed on the first processor exceeds a threshold; and
when the computational cost exceeds a threshold, shipping a function call to an idle processor, executing the function call on the idle processor and returning results of the function call to the first processor.
2 Assignments
0 Petitions
Accused Products
Abstract
A parallel programming system implements dynamic load balancing to distribute processing workload to available processors in a parallel computer. A preprocessor in the system converts a nested parallel program into sequential code executable on processors of the parallel computer and calls to a message passing interface for inter-processor communication among the processors. When processing a nested parallel program, the preprocessor inserts a test function to evaluate the computational cost of a function call. At runtime, processors evaluate the test function to determine whether to ship a function call to another processor. This approach enables processors to offload function calls to other available processors in cases where it is more efficient to incur the cost of shipping the function call and receiving the results than it is to process the function call on the original processor.
387 Citations
13 Claims
-
1. A method for dynamically balancing a processing workload among parallel processors operating on a program, where the program comprises recursive function calls capable of being executed in parallel, the method comprising:
-
compiling both parallel and serial versions of the program, the parallel version being capable of executing in a data-parallel fashion on every processor in a team of processors and the serial version being capable of executing on a single processor;
inserting a test function in a serial version of the program;
upon recursive function calls of the program, subdividing each team of processors into smaller teams, with one team per recursive function call;
when the subdividing step results in a team having only one processor, calling the serial version of the code from the parallel version of the code executing on that processor;
executing the test function on a first processor to determine whether computational cost of a recursive function call to be executed on the first processor exceeds a threshold; and
when the computational cost exceeds a threshold, shipping a function call to an idle processor, executing the function call on the idle processor and returning results of the function call to the first processor. - View Dependent Claims (2, 3, 4, 5)
comparing expected computational cost of the function call with expected communication cost of sending the function call to another processor and receiving results of the function call from the other processor;
based on the comparing step, determining whether to ship the function call to the other processor.
-
-
5. A computer readable medium having instructions for performing the steps of claim 1.
-
6. A computer readable medium comprising:
-
a preprocessor for converting a parallel program written in a programming language with control parallel and data parallel operations to an imperative, sequential programming code and calls to a message passing interface for inter-processor communication on a parallel computer;
wherein the preprocessor is operable to generate a parallel and a serial version of the parallel program and to insert a test function in the serial version of the program to evaluate computational cost of a function call in the serial version;
a compiler for compiling the imperative programming code and the calls to the message passing interface generated by the preprocessor;
a message passing interface run-time library for performing operations of the message passing interface in response to the calls to the message passing interface;
a parallel language run-time library for keeping track of idle processors during execution of the parallel program, for managing requests to send the function call from a requesting processor to an idle processor when the requesting processor determines that the computational cost of the function call exceeds a threshold; and
a linker for linking the compiled imperative programming code and calls to the message passing interface with the message passing interface run-time library and the parallel language run-time library. - View Dependent Claims (7, 8, 9, 10)
a split function to split the nested parallel program into recursive function calls, each capable of being executed in parallel on a team of processors; - and
wherein the preprocessor is operable to generate code for switching from the parallel to the serial version of the program when the program recurses to a point where a team has a single processor.
-
-
10. The computer readable medium of claim 7 wherein the preprocessor is operable to insert a cost function in the parallel version of the program to compute computation cost of each recursive function call, and is operable to insert code for computing the number of processors in each team based on the computational cost of the recursive function calls.
-
11. A computer readable medium having a nested parallel program for execution in a parallel machine with two or more processors, the medium comprising:
-
code for splitting the program into recursive function calls and assigning each recursive function call to a subset of the processors;
code for evaluating when a subset of processors includes a single processor; and
code for evaluating computational cost of a recursive function call and for shipping a function call to an idle processor when the computational cost exceeds a threshold.
-
-
12. A method for dynamically balancing a processing workload among parallel processors operating on a program, where the program comprises function calls capable of being executed in parallel, the method comprising:
-
executing the program on each processor in a group of processors in a parallel computer, the program including a test function for evaluating the computational cost of a function call at runtime;
executing the test function on a first processor to determine whether computational cost of the function call to be executed on the first processor exceeds a threshold; and
when the computational cost computed by the test function exceeds a threshold, shipping the function call to an available processor, executing the function call on the available processor and returning results of the function call to the first processor.
-
-
13. A parallel processing method for executing a parallel divide and conquer program having recursive calls capable of being executed in parallel, the method comprising:
-
spreading a parallel data structure of the program across processors in a parallel computer, at each recursive call of the program, subdividing a group of processors into subgroups, with one subgroup per parallel function call;
selecting the number of processors in each subgroup to approximate the computational cost of the parallel function call for each subgroup;
in one of the processors acting as a manager, mediating among processors that are idle, including;
when an idle processor is detected, adding the idle processor to an idle processor queue;
when an overloaded processor is about to recurse on a function call having a computational size greater than a predetermined threshold, providing the idle processor with a function call from the overloaded processor;
using a lazy collection oriented data type to represent the data of the parallel program;
deferring inter-processor communication by leaving the lazy collection oriented data type in a lazy state until a function call is made that operates on the lazy collection oriented data type and requires the lazy collection oriented data type to be redistributed among the processors;
in a global communication step, redistributing the data structures among the subgroups.
-
Specification