Hardware based dynamic load balancing of message passing interface tasks
First Claim
1. A method, in a multiple processor system, for executing a message passing interface (MPI) job using a plurality of processors, comprising:
- receiving one or more MPI synchronization operation calls from one or more processors of the plurality of processors, wherein the MPI synchronization operation calls include an identifier of a MPI task performing the MPI synchronization operation call and a timestamp of the MPI synchronization operation call, the MPI task being part of an MPI job being executed on the plurality of processors;
storing an entry in a history data structure identifying the one or more MPI synchronization operation calls and their associated MPI task identifier and timestamp;
modifying an operation of the plurality of processors for executing the MPI job based on the history data structure by;
determining if a wait period of a first processor in the plurality of processors meets or exceeds a threshold value; and
in response to the wait period of the first processor meeting or exceeding the threshold value, modifying an operation of the plurality of processors to reduce the wait period of the first processor;
determining a measure of the relative completion of computation phases of tasks of the MPI job on the plurality of processors based on the history data structure; and
modifying the operation of the plurality of processors based on the relative completion of computation phases of tasks of the MPI job, wherein the measure of the relative completion of computation phases of tasks of the MPI job indicate a relative order in which the processors in the plurality of processors completed their respective computation phases of tasks.
2 Assignments
0 Petitions
Accused Products
Abstract
Mechanisms for providing hardware based dynamic load balancing of message passing interface (MPI) tasks are provided. Mechanisms for adjusting the balance of processing workloads of the processors executing tasks of an MPI job are provided so as to minimize wait periods for waiting for all of the processors to call a synchronization operation. Each processor has an associated hardware implemented MPI load balancing controller. The MPI load balancing controller maintains a history that provides a profile of the tasks with regard to their calls to synchronization operations. From this information, it can be determined which processors should have their processing loads lightened and which processors are able to handle additional processing loads without significantly negatively affecting the overall operation of the parallel execution system. As a result, operations may be performed to shift workloads from the slowest processor to one or more of the faster processors.
-
Citations
17 Claims
-
1. A method, in a multiple processor system, for executing a message passing interface (MPI) job using a plurality of processors, comprising:
-
receiving one or more MPI synchronization operation calls from one or more processors of the plurality of processors, wherein the MPI synchronization operation calls include an identifier of a MPI task performing the MPI synchronization operation call and a timestamp of the MPI synchronization operation call, the MPI task being part of an MPI job being executed on the plurality of processors; storing an entry in a history data structure identifying the one or more MPI synchronization operation calls and their associated MPI task identifier and timestamp; modifying an operation of the plurality of processors for executing the MPI job based on the history data structure by; determining if a wait period of a first processor in the plurality of processors meets or exceeds a threshold value; and in response to the wait period of the first processor meeting or exceeding the threshold value, modifying an operation of the plurality of processors to reduce the wait period of the first processor; determining a measure of the relative completion of computation phases of tasks of the MPI job on the plurality of processors based on the history data structure; and modifying the operation of the plurality of processors based on the relative completion of computation phases of tasks of the MPI job, wherein the measure of the relative completion of computation phases of tasks of the MPI job indicate a relative order in which the processors in the plurality of processors completed their respective computation phases of tasks. - View Dependent Claims (2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13)
-
-
14. A method, in a multiple processor system, for executing a message passing interface (MPI) job using a plurality of processors, comprising:
-
receiving one or more MPI synchronization operation calls from one or more processors of the plurality of processors, wherein the MPI synchronization operation calls include an identifier of a MPI task performing the MPI synchronization operation call and a timestamp of the MPI synchronization operation call, the MPI task being part of an MPI job being executed on the plurality of processors; storing an entry in a history data structure identifying the one or more MPI synchronization operation calls and their associated MPI task identifier and timestamp; modifying an operation of the plurality of processors for executing the MPI job based on the history data structure by; determining if a wait period of a first processor in the plurality of processors meets or exceeds a threshold value; and in response to the wait period of the first processor meeting or exceeding the threshold value, modifying an operation of the plurality of processors to reduce the wait period of the first processor, wherein modifying an operation of the plurality of processors for executing the MPI job based on the entries in the history data structure comprises; performing, in a current MPI job processing cycle, one or more setup operations in a second processor of the plurality of processors for preparing to process one of a larger portion of data or a larger number of tasks, in a subsequent MPI job processing cycle subsequent to the current MPI job processing cycle, wherein the one or more setup operations are performed while other processors of the plurality of processors are executing their respective tasks of the MPI job in the current MPI job processing cycle. - View Dependent Claims (15, 16)
-
-
17. A method, in a multiple processor system, for executing a message passing interface (MPI) job using a plurality of processors, comprising:
-
receiving one or more MPI synchronization operation calls from one or more processors of the plurality of processors, wherein the MPI synchronization operation calls include an identifier of a MPI task performing the MPI synchronization operation call and a timestamp of the MPI synchronization operation call, the MPI task being part of an MPI job being executed on the plurality of processors; storing an entry in a history data structure identifying the one or more MPI synchronization operation calls and their associated MPI task identifier and timestamp; modifying an operation of the plurality of processors for executing the MPI job based on the history data structure by; determining if a wait period of a first processor in the plurality of processors meets or exceeds a threshold value; and in response to the wait period of the first processor meeting or exceeding the threshold value, modifying an operation of the plurality of processors to reduce the wait period of the first processor, wherein modifying the operation of the plurality of processors for executing the MPI job based on the history data structure comprises selecting another program for execution on at least one of the processors of the plurality of processors during an idle period before a last processor in the plurality of processors calls the MPI synchronization operation and while other processors in the plurality of processors are executing their tasks of the MPI job.
-
Specification