×

Recording A Communication Pattern and Replaying Messages in a Parallel Computing System

  • US 20110010471A1
  • Filed: 07/10/2009
  • Published: 01/13/2011
  • Est. Priority Date: 07/10/2009
  • Status: Active Grant
First Claim
Patent Images

1. A parallel computer system comprising:

  • a plurality of compute nodes, each of said compute nodes comprising;

    at least one processor;

    at least one memory; and

    a direct memory address engine coupled to said at least one processor and said at least one memory; and

    a network interconnecting said plurality of compute nodes;

    wherein;

    said network operates a global message-passing application for performing communications across said network;

    local instances of said global message-passing application operate at each of said compute nodes to carry out local processing operations independent of processing operations carried out at another one of said compute nodes;

    said direct memory address engines are configured to interact with said local instances of said global message-passing application via injection FIFO metadata describing an injection FIFO in a corresponding one of said memories;

    said local instances of said global message passing application are configured to record, in said injection FIFO in said corresponding one of said memories, message descriptors associated with messages of an arbitrary communication pattern in an iteration of an executing application program; and

    said local instances of said global message passing application are configured to replay said message descriptors during a subsequent iteration of said executing application program.

View all claims
  • 2 Assignments
Timeline View
Assignment View
    ×
    ×