×

Template based parallel checkpointing in a massively parallel computer system

  • US 7,487,393 B2
  • Filed: 04/16/2008
  • Issued: 02/03/2009
  • Est. Priority Date: 04/14/2005
  • Status: Expired due to Fees
First Claim
Patent Images

1. A computer implemented method for checkpointing a massively parallel computer system comprising the steps of:

  • a) a checkpoint server broadcasting a list of data block checksums from a previous checkpoint to all compute nodes arranged in a cluster; and

    b) each compute node searching its own memory image for checksum matches using an rsync protocol rolling checksum algorithm;

    wherein each node performs the steps of;

    1) producing a template of new data blocks with checksums that didn'"'"'t exist in the previous checkpoint;

    2) producing a template of references to the original data blocks that did exist in the previous checkpoint;

    3) sending its new data block checksum template to an adjacent node in the cluster of nodes;

    4) comparing checksums to find common data blocks between all adjacent nodes as well as its own data blocks;

    5) informing adjacent nodes to replace a reference to a common data block with a reference to a data block on another node;

    c) the checkpoint server then collecting reference templates from the compute nodes and storing them in the checkpoint server; and

    d) collecting new unique data blocks and storing them to the checkpoint server.

View all claims
  • 0 Assignments
Timeline View
Assignment View
    ×
    ×