Message Flow Control in a Multi-Node Computer System
First Claim
1. A computer-implemented method for controlling message flow in a parallel computing system having a plurality of compute nodes, the method comprising:
- assigning a first set of compute nodes to a first node pool, wherein a first message flow control policy is assigned to each compute node of the first node pool, and wherein the message flow control policy specifies at least one of logging and/or tracing activity to be performed by an instance of an application running on at least a first compute node assigned to the first node pool;
initiating execution of the application on each of the compute nodes in the first node pool; and
while executing the application on at least the first compute node, generating one or more logging or tracing messages according to the first message flow control policy.
1 Assignment
0 Petitions
Accused Products
Abstract
Embodiments of the invention provide for controlling message flow across a parallel computer system having multiple compute nodes by selectively grouping compute nodes of such a system into node pools and assigning message flow control policies to nodes in the node pools. The message flow control policies specify logging and/or tracing activities to be performed by instances of applications running on nodes assigned to the node pools. As the application is executed, logging and/or tracing messages are generated on the compute nodes according to message flow control policies assigned to the nodes. Optionally, the message flow is analyzed, the message flow control policies are adjusted, and duplicate messages are eliminated.
-
Citations
25 Claims
-
1. A computer-implemented method for controlling message flow in a parallel computing system having a plurality of compute nodes, the method comprising:
-
assigning a first set of compute nodes to a first node pool, wherein a first message flow control policy is assigned to each compute node of the first node pool, and wherein the message flow control policy specifies at least one of logging and/or tracing activity to be performed by an instance of an application running on at least a first compute node assigned to the first node pool; initiating execution of the application on each of the compute nodes in the first node pool; and while executing the application on at least the first compute node, generating one or more logging or tracing messages according to the first message flow control policy. - View Dependent Claims (2, 3, 4, 5, 6, 7, 8, 9, 10, 11)
-
-
12. A computer-readable storage medium containing a program which, when executed by a processor, performs an operation for controlling message flow in a parallel computing system having a plurality of compute nodes, the operation comprising:
-
assigning a first set of compute nodes to a first node pool, wherein a first message flow control policy is assigned to each compute node of the first node pool, and wherein the message flow control policy specifies at least one of logging and or tracing activity to be performed by an instance of an application running on at least a first compute node assigned to the first node pool; initiating execution of the application on each of the compute nodes in the first node pool; and while executing the application on at least the first compute node, generating one or more logging or tracing messages according to the first message flow control policy. - View Dependent Claims (13, 14, 15, 16, 17, 18, 19, 20, 21)
-
-
22. A parallel computing system, comprising:
-
a plurality of compute nodes, each having at least a processor and a memory, wherein the plurality of compute nodes is configured to execute a parallel computing task; and a service node having at least a processor and a memory and a tracing-logging control program for controlling message flow in the parallel computing system, wherein the tracing-logging control program is configured to; assign a first set of compute nodes to a first node pool, wherein a first message flow control policy is assigned to each compute node of the first node pool, and wherein the message flow control policy specifies at least one of logging and/or tracing activity to be performed by an instance of an application running on at least a first compute node assigned to the first node pool; and initiate execution of the application on each of the compute nodes in the first node pool, wherein at least the first compute node is configured to generate, while executing the application, one or more logging or tracing messages according to the first message flow control policy. - View Dependent Claims (23, 24, 25)
-
Specification