Parallel computer system and program
First Claim
1. A parallel computer system for performing parallel computation, the system comprising:
- a plurality of computing units connected via a network, the computing units serve as a master node performing synchronization processes or as a worker node performing task processes, and one computing unit is designated as the master node and the other plurality of computing units are designated as the worker nodes,wherein the master node;
sets a master determination time, before which the task processes in all the worker nodes are expected to be completed within a basic process time of each task process;
transmits a process start notification to the plurality of worker nodes;
checks whether a process-not-completed notification is received from any of the worker nodes at the master determination time;
if the process-not-completed notification is received, the master node transmits a process extension notification to the plurality of worker nodes regardless of whether the worker nodes have completed processing of the process tasks; and
if the process-not-completed notification is not received, the master node transmits a synchronization completion notification to the plurality of worker nodes;
wherein the worker nodes each;
set a worker determination time using the basic process time when the process start notification is received from the master node;
if the task process is not completed at the worker determination time, the worker node transmits the process-not-complete notification to the master node; and
if the task process is completed at the worker determination time, the worker node not sending the process-not-complete-notification and waits for the synchronization completion notification from the master node.
1 Assignment
0 Petitions
Accused Products
Abstract
There is provided a parallel computer system for performing barrier synchronization using a master node and a plurality of worker nodes based on the time to allow for an adaptive setting of the synchronization time. When a task process in a certain worker node has not been completed by a worker determination time, the particular worker node performs a communication to indicate that the process has not been completed, to a master node. When the communication has been received by a master determination time, the master node performs a communication to indicate that the process time is extended by a correction process time, in order to adjust and extend the synchronization time. In this way, it is possible to reduce the synchronization overhead associated with the execution of an application with a relatively large variation in the process time from a synchronization point to the next synchronization point.
3 Citations
16 Claims
-
1. A parallel computer system for performing parallel computation, the system comprising:
-
a plurality of computing units connected via a network, the computing units serve as a master node performing synchronization processes or as a worker node performing task processes, and one computing unit is designated as the master node and the other plurality of computing units are designated as the worker nodes, wherein the master node; sets a master determination time, before which the task processes in all the worker nodes are expected to be completed within a basic process time of each task process; transmits a process start notification to the plurality of worker nodes; checks whether a process-not-completed notification is received from any of the worker nodes at the master determination time; if the process-not-completed notification is received, the master node transmits a process extension notification to the plurality of worker nodes regardless of whether the worker nodes have completed processing of the process tasks; and if the process-not-completed notification is not received, the master node transmits a synchronization completion notification to the plurality of worker nodes; wherein the worker nodes each; set a worker determination time using the basic process time when the process start notification is received from the master node; if the task process is not completed at the worker determination time, the worker node transmits the process-not-complete notification to the master node; and if the task process is completed at the worker determination time, the worker node not sending the process-not-complete-notification and waits for the synchronization completion notification from the master node. - View Dependent Claims (2, 3, 4, 5, 6, 7, 8)
-
-
9. A program stored in a non-transitory computer readable medium and executed by a processor of a plurality of computing units connected via a network, the plurality of computing units are part of a parallel computer system for performing parallel computation the computing units each comprise the processor and the non-transitory computer readable medium, the computing units serve as a master node performing synchronization processes or as a worker node performing task processes, and one computing unit is designated as the master node and the other of the plurality of computing units are designated as the working nodes,
wherein the program causes the processor of the computing unit serving as the master node to perform the steps of: -
setting a master determination time, before which the task processes in all the worker nodes are expected to be completed within a basic process time of each task process; transmitting a process start notification to the plurality of worker nodes; checking whether a process-not-completed notification is received from any of the worker nodes at the master determination time, when the process-not-completed notification is received, transmitting a process extension notification to the plurality of worker nodes regardless of whether the worker nodes have completed processing of the process tasks; and when the process-not-completed notification is not received, transmitting a synchronization completion notification to the plurality of worker nodes, wherein the program causes a computing unit serving as a worker node to perform the steps of; setting a worker determination time by using the basic process time when the process start notification is received from the master node; when the task process is not completed at the worker determination time, transmitting the process-not-completed notification to the master node; and when the task process is completed at the worker determination time, not sending the process-not-completed notification and waiting for the synchronization completion notification from the master node. - View Dependent Claims (10, 11, 12, 13, 14, 15)
-
-
16. A method for controlling a parallel computer system for performing parallel computation by connecting a plurality of computing units via a network, the computing units serve as a master node for performing synchronization processes or as a worker node for performing task processes, and one computing unit is designated as the master node while the other of the plurality of computing units are designated as the worker nodes, the method comprising the steps of:
-
setting, by the master node, a master determination time before which the task processes in all the worker nodes are expected to be completed within a basic process time of each task process; transmitting, by the master node, a process start notification to the plurality of worker nodes; checking, by the master node, whether a process-not-completed notification is received from any of the worker nodes at the master determination time, when the process-not-completed notification is received, transmitting, by the master node, a process extension notification to the plurality of worker nodes regardless of whether the worker nodes have completed processing of the process tasks; when the process-not-completed notification is not received, transmitting, by the master node, a synchronization completion notification to the plurality of worker nodes; setting, by each worker node, a worker determination time by using the basic process time when the process start notification is received from the master node; when the task process is not completed at the worker determination time, transmitting, by each worker node, the process-not-completed notification to the master node; and when the task process is completed at the worker determination time, not sending, by the worker node, the process-not-completed notification and waiting, by each worker node, for the synchronization completion notification from the master node.
-
Specification