Recording A Communication Pattern and Replaying Messages in a Parallel Computing System
First Claim
1. A parallel computer system comprising:
- a plurality of compute nodes, each of said compute nodes comprising;
at least one processor;
at least one memory; and
a direct memory address engine coupled to said at least one processor and said at least one memory; and
a network interconnecting said plurality of compute nodes;
wherein;
said network operates a global message-passing application for performing communications across said network;
local instances of said global message-passing application operate at each of said compute nodes to carry out local processing operations independent of processing operations carried out at another one of said compute nodes;
said direct memory address engines are configured to interact with said local instances of said global message-passing application via injection FIFO metadata describing an injection FIFO in a corresponding one of said memories;
said local instances of said global message passing application are configured to record, in said injection FIFO in said corresponding one of said memories, message descriptors associated with messages of an arbitrary communication pattern in an iteration of an executing application program; and
said local instances of said global message passing application are configured to replay said message descriptors during a subsequent iteration of said executing application program.
2 Assignments
0 Petitions
Accused Products
Abstract
A parallel computer system includes a plurality of compute nodes. Each of the compute nodes includes at least one processor, at least one memory, and a direct memory address engine coupled to the at least one processor and the at least one memory. The system also includes a network interconnecting the plurality of compute nodes. The network operates a global message-passing application for performing communications across the network. Local instances of the global message-passing application operate at each of the compute nodes to carry out local processing operations independent of processing operations carried out at another one of the compute nodes. The direct memory address engines are configured to interact with the local instances of the global message-passing application via injection FIFO metadata describing an injection FIFO in a corresponding one of the memories. The local instances of the global message passing application are configured to record, in the injection FIFO in the corresponding one of the memories, message descriptors associated with messages of an arbitrary communication pattern in an iteration of an executing application program. The local instances of the global message passing application are configured to replay the message descriptors during a subsequent iteration of the executing application program.
39 Citations
25 Claims
-
1. A parallel computer system comprising:
-
a plurality of compute nodes, each of said compute nodes comprising; at least one processor; at least one memory; and a direct memory address engine coupled to said at least one processor and said at least one memory; and a network interconnecting said plurality of compute nodes; wherein; said network operates a global message-passing application for performing communications across said network; local instances of said global message-passing application operate at each of said compute nodes to carry out local processing operations independent of processing operations carried out at another one of said compute nodes; said direct memory address engines are configured to interact with said local instances of said global message-passing application via injection FIFO metadata describing an injection FIFO in a corresponding one of said memories; said local instances of said global message passing application are configured to record, in said injection FIFO in said corresponding one of said memories, message descriptors associated with messages of an arbitrary communication pattern in an iteration of an executing application program; and said local instances of said global message passing application are configured to replay said message descriptors during a subsequent iteration of said executing application program. - View Dependent Claims (2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13)
-
-
14. A method comprising the steps of
providing a parallel computer system comprising: -
a plurality of compute nodes, each of said compute nodes comprising; at least one processor; at least one memory; and a direct memory address engine coupled to said at least one processor and said at least one memory; and a network interconnecting said plurality of compute nodes; wherein; said network operates a global message-passing application for performing communications across said network; local instances of said global message-passing application operate at each of said compute nodes to carry out local processing operations independent of processing operations carried out at another one of said compute nodes; and said direct memory address engines are configured to interact with said local instances of said global message-passing application via injection FIFO metadata describing an injection FIFO in a corresponding one of said memories; recording, with said local instances of said global message passing application, in said injection FIFO in said corresponding one of said memories, message descriptors associated with messages of an arbitrary communication pattern in an iteration of an executing application program; and replaying said message descriptors, with said local instances of said global message passing application, during a subsequent iteration of said executing application program. - View Dependent Claims (15, 16, 17, 18, 19, 20, 21, 22, 23)
-
-
24. A computer program product comprising a computer readable storage medium having computer readable program code embodied therewith, the computer readable program code comprising:
-
computer readable program code configured to implement a global message-passing application for performing communications across the network, with local instances of said global message-passing application operating at each of the compute nodes to carry out local processing operations independent of processing operations carried out at another one of the compute nodes; computer readable program code configured to facilitate interaction between said direct memory address engines and said local instances of said global message-passing application via injection FIFO metadata describing an injection FIFO in a corresponding one of said memories; computer readable program code configured to facilitate recording message descriptors associated with messages of an arbitrary communication pattern in an iteration of an executing application program; and computer readable program code configured to facilitate replaying said message descriptors during a subsequent iteration of said executing application program.
-
-
25. An apparatus comprising:
-
a plurality of compute nodes, each of said compute nodes comprising; at least one processor; at least one memory; and a direct memory address engine coupled to said at least one processor and said at least one memory; means for interconnecting said plurality of compute nodes; means for global message-passing communications across said means for interconnecting, said means for global message passing operating at each of said compute nodes to carry out local processing operations independent of processing operations carried out at another one of said compute nodes; and means for facilitating interaction between said direct memory address engines and said means for global message-passing; means for recording message descriptors associated with messages of an arbitrary communication pattern in an iteration of an executing application program; and means for replaying said message descriptors during a subsequent iteration of said executing application program.
-
Specification