Method to recover from node failure/recovery incidents in distributed systems in which notification does not occur
First Claim
Patent Images
1. A method, for establishing communications in a distributed data processing system, said method comprising:
- determining the existence of a state discrepancy between a communication source and a communication destination through the use of a pair wise epoch indication for said communication source and said communication destination;
determining from said epoch indications that said state discrepancy indicates that neither said source nor said destination has undergone a system reset and that a failure at said source has occurred;
negotiating between said source and said destination to establish a new state at said destination consistent with the state at said source so as to permit said source to exit from a non-ready state with respect to said destination, wherein said negotiating includes the steps of;
receiving at said destination a message with a non-matching epoch indication, dropping said message and sending a not-ready acknowledgment message to said source along with the same non-matching epoch indication sent by said source;
receiving at said source said not-ready acknowledgment message, setting said source into a non-ready state with respect to said destination, incrementing the received epoch indication and sending a ready request message to said destination;
receiving at said destination said ready request message from said source, setting its epoch indication with respect to said source to the received epoch indication and sending a ready acknowledgment message to said source;
receiving at said source said ready acknowledgment message, setting its ready state with respect to said destination and transmitting a message to said destination; and
receiving said message at said destination with a matching epoch indication and setting its ready state with respect to said source.
1 Assignment
0 Petitions
Accused Products
Abstract
Epoch numbers are maintained in a pair wise fashion at a plurality of communication endpoints to provide communication consistency and recovery from a range of failure conditions including total or partial node failure and subsequent recovery. Once an epoch state inconsistency is recognized, negotiation procedures provide an effective mechanism to reestablish valid communication links without the need to employ global variables which inherently possess greater transmission and overhead requirements needed to maintain communications. Renegotiation of recognizably valid epoch numbers occurs on a pair wise basis.
23 Citations
3 Claims
-
1. A method, for establishing communications in a distributed data processing system, said method comprising:
-
determining the existence of a state discrepancy between a communication source and a communication destination through the use of a pair wise epoch indication for said communication source and said communication destination; determining from said epoch indications that said state discrepancy indicates that neither said source nor said destination has undergone a system reset and that a failure at said source has occurred; negotiating between said source and said destination to establish a new state at said destination consistent with the state at said source so as to permit said source to exit from a non-ready state with respect to said destination, wherein said negotiating includes the steps of; receiving at said destination a message with a non-matching epoch indication, dropping said message and sending a not-ready acknowledgment message to said source along with the same non-matching epoch indication sent by said source; receiving at said source said not-ready acknowledgment message, setting said source into a non-ready state with respect to said destination, incrementing the received epoch indication and sending a ready request message to said destination; receiving at said destination said ready request message from said source, setting its epoch indication with respect to said source to the received epoch indication and sending a ready acknowledgment message to said source; receiving at said source said ready acknowledgment message, setting its ready state with respect to said destination and transmitting a message to said destination; and receiving said message at said destination with a matching epoch indication and setting its ready state with respect to said source.
-
-
2. A method for establishing communication in a distributed data processing system, said method comprising the steps of:
-
determining the existence of a state discrepancy between a communication source and a communication destination through the use of pair wise epoch indications for said communication source and said communication destination; determining from said epoch indications that said discrepancy indicates that said source has undergone a system reset; negotiating between said source and said destination to establish a new state at said destination consistent with the state at said source so as to permit said destination to exit from a non-ready state with respect to said source; said source setting its epoch number so as to indicate a system reset at said source and sending said epoch indication to said destination with a ready request message; receiving said epoch indication at said destination and maintaining a count of such received epoch numbers until a threshold is reached; once said threshold is reached, incrementing said epoch indication and sending a ready acknowledgment message to said source along with said updated epoch indication; receiving at said source said ready acknowledgment message and said updated epoch indication, setting its epoch indication with respect to said destination to said updated epoch indication, setting its ready state with respect to said destination and transmitting a message to said destination; and receiving said message at said destination with a matching epoch indication and setting its ready state with respect to said source.
-
-
3. A method for establishing communication in a distributed data processing system, said method comprising the steps of:
-
determining the existence of a state discrepancy between a communication source and a communication destination through the use of pair wise epoch indications for said communication source and said communication destination; determining from said epoch indications that said discrepancy indicates that said destination has undergone a system reset; negotiating between said source and said destination to establish a new state at said destination consistent with the state at said source so as to permit said destination to exit from a non-ready state with respect to said source; said negotiation including the steps of; said destination setting its epoch indication so as to indicate a system reset at said destination and sending said epoch indication to said source with a not ready acknowledgment message; receiving at said source said epoch indication and said not ready acknowledgment message and maintaining a count of such received epoch indication until a threshold is reached; once said threshold is reached, incrementing said epoch indication at said source and sending a ready request message to said destination along with said incremented epoch indication; receiving at said destination said ready request message and said incremented epoch indication, setting its epoch number with respect to said destination to said incremented epoch indication, and sending a ready acknowledgment message to said source; receiving at said source said ready acknowledgment message, setting said ready state with respect to said destination and transmitting a message to said destination with said incremented epoch indication; and receiving said message at said destination with a matching epoch indication and setting its ready state with respect to said source.
-
Specification