×

Network Fault Detection and Reconfiguration

  • US 20130297976A1
  • Filed: 03/07/2013
  • Published: 11/07/2013
  • Est. Priority Date: 05/04/2012
  • Status: Active Grant
First Claim
Patent Images

1. A computer-implemented method for detecting communication faults in a parallel system, the method comprising:

  • sending ping messages by a node of the parallel system to one or more destination nodes, wherein the parallel system comprises a plurality of nodes communicating with each other using a plurality of links, each node comprising a processor;

    waiting to receive acknowledgements from each destination node indicating the destination node received the ping message;

    responsive to failure to receive one or more acknowledgement message, detecting failure of corresponding one or more ping messages to reach their destination nodes; and

    responsive to detecting failure of one or more ping messages to reach their target nodes, identifying faulty component in the parallel system, the identifying comprising;

    freezing communications in the parallel system by sending a request to nodes of the system to stop sending and receiving messages except for ping messages;

    sending ping messages through different components of the parallel system;

    identifying the faulty component based on failure to deliver a ping message through the component; and

    unfreezing the parallel system by sending requests to the nodes of the system to restart sending and receiving messages other than ping messages.

View all claims
  • 6 Assignments
Timeline View
Assignment View
    ×
    ×