PARALLEL COMPUTER SYSTEM AND PROGRAM
First Claim
1. A parallel computer system for performing parallel computation by connecting a plurality of computing units via a network,wherein the computing unit serves as a master node for performing synchronization process or as a worker node for performing task process,wherein the master node includes:
- setting a master determination time with the expectation that the task process in all the worker nodes is completed within a basic process time;
transmitting a process start notification to the plurality of worker nodes;
checking whether a process-not-completed notification is received from the worker node at the master determination time;
when the process-not-completed notification is received, transmitting a process extension notification to the plurality of worker nodes; and
when the process-not-completed notification is not received, transmitting a synchronization completion notification to the plurality of worker nodes;
wherein the worker node includes;
setting a worker determination time by using the basic process time when the process start notification or the synchronization completion notification is received from the master node;
when the task process is not completed at the worker determination time, transmitting the process-not-completed notification to the master node; and
when the task process is completed, waiting for the synchronization completion notification from the master node.
1 Assignment
0 Petitions
Accused Products
Abstract
There is provided a parallel computer system for performing barrier synchronization using a master node and a plurality of worker nodes based on the time to allow for an adaptive setting of the synchronization time. When a task process in a certain worker node has not been completed by a worker determination time, the particular worker node performs a communication to indicate that the process has not been completed, to a master node. When the communication has been received by a master determination time, the master node performs a communication to indicate that the process time is extended by a correction process time, in order to adjust and extend the synchronization time. In this way, it is possible to reduce the synchronization overhead associated with the execution of an application with a relatively large variation in the process time from a synchronization point to the next synchronization point.
28 Citations
15 Claims
-
1. A parallel computer system for performing parallel computation by connecting a plurality of computing units via a network,
wherein the computing unit serves as a master node for performing synchronization process or as a worker node for performing task process, wherein the master node includes: -
setting a master determination time with the expectation that the task process in all the worker nodes is completed within a basic process time; transmitting a process start notification to the plurality of worker nodes; checking whether a process-not-completed notification is received from the worker node at the master determination time; when the process-not-completed notification is received, transmitting a process extension notification to the plurality of worker nodes; and when the process-not-completed notification is not received, transmitting a synchronization completion notification to the plurality of worker nodes; wherein the worker node includes; setting a worker determination time by using the basic process time when the process start notification or the synchronization completion notification is received from the master node; when the task process is not completed at the worker determination time, transmitting the process-not-completed notification to the master node; and when the task process is completed, waiting for the synchronization completion notification from the master node. - View Dependent Claims (2, 3, 4, 5, 6, 7, 8)
-
-
9. A program executed by a processor of a parallel computer system for performing parallel computation by connecting a plurality of computing units via a network,
wherein the computing unit includes the processor and a storage unit, serving as a master node for performing synchronization process or as a worker node for performing task process, wherein the program causes the processor of the computing unit serving as the master node to perform the steps of: -
setting a master determination time with the expectation that the task process in all the worker nodes is completed within a basic process time; transmitting a process start notification to the plurality of worker nodes; checking whether a process-not-completed notification is received from the worker node at the master determination time, when the process-not-completed notification is received, transmitting a process extension notification to the plurality of worker nodes; and when the process-not-completed notification is not received, transmitting a synchronization completion notification to the plurality of worker nodes, wherein the program causes the computing unit serving as the worker node to perform the steps of; setting a worker determination time by using the basic process time when the process start notification or the synchronization completion notification is received from the master node; when the task process is not completed at the worker determination time, transmitting the process-not-completed notification to the master node; and when the task process is completed, waiting for the synchronization completion notification from the master node. - View Dependent Claims (10, 11, 12, 13, 14, 15)
-
Specification