Method to recover from node failure/recovery incidents in distributed sytems in which notification does not occur
First Claim
1. A method, for establishing communications in a distributed data processing system, said method comprising:
- maintaining, at each one of a plurality of communication endpoints in said distributed data processing system, pair wise epoch number indications relative to other ones of said plurality of communication endpoints and negotiating message transmission between pairs of said endpoints based on said pair wise epoch number indications, whereby the need for global status maintenance and transmission is eliminated.
1 Assignment
0 Petitions
Accused Products
Abstract
Epoch numbers are maintained in a pair wise fashion at a plurality of communication endpoints to provide communication consistency and recovery from a range of failure conditions including total or partial node failure and subsequent recovery. Once an epoch state inconsistency is recognized, negotiation procedures provide an effective mechanism to reestablish valid communication links without the need to employ global variables which inherently possess greater transmission and overhead requirements needed to maintain communications. Renegotiation of recognizably valid epoch numbers occurs on a pair wise basis.
-
Citations
11 Claims
-
1. A method, for establishing communications in a distributed data processing system, said method comprising:
- maintaining, at each one of a plurality of communication endpoints in said distributed data processing system, pair wise epoch number indications relative to other ones of said plurality of communication endpoints and negotiating message transmission between pairs of said endpoints based on said pair wise epoch number indications, whereby the need for global status maintenance and transmission is eliminated.
-
2. A method for establishing communication in a distributed data processing system, said method comprising the steps of:
-
determining the existence of a state discrepancy between a communication source and a communication destination;
determining that said discrepancy indicates that neither said source nor said destination has undergone a system reset and that a failure at said source has occurred; and
negotiating between said source and said destination to establish a new state at said destination consistent with the state at said source so as to permit said source to exit from a non-ready state with respect to said destination. - View Dependent Claims (3, 5)
-
-
4. A method for establishing communication in a distributed data processing system, said method comprising the steps of:
-
determining the existence of a state discrepancy between a communication source and a communication destination;
determining that said discrepancy indicates that neither said source nor said destination has undergone a system reset and that a failure at said destination has occurred; and
negotiating between said source and said destination to establish a new state at said destination consistent with the state at said source so as to permit said destination to exit from a non-ready state with respect to said source.
-
-
6. A method for establishing communication in a distributed data processing system, said method comprising the steps of:
-
determining the existence of a state discrepancy between a communication source and a communication destination;
determining that said discrepancy indicates that said source has undergone a system reset; and
negotiating between said source and said destination to establish a new state at said destination consistent with the state at said source so as to permit said destination to exit from a non-ready state with respect to said source. - View Dependent Claims (7)
-
-
8. A method for establishing communication in a distributed data processing system, said method comprising the steps of:
-
determining the existence of a state discrepancy between a communication source and a communication destination;
determining that said discrepancy indicates that said destination has undergone a system reset; and
negotiating between said source and said destination to establish a new state at said destination consistent with the state at said source so as to permit said destination to exit from a non-ready state with respect to said source. - View Dependent Claims (9)
-
-
10. A computer readable medium having computer executable instructions causing a computer to maintain at each one of a plurality of communication endpoints in a distributed data processing system, pair wise epoch number indications relative to other ones of said plurality of communication endpoints and negotiate message transmission between pairs of said endpoints based on said pair wise epoch number indications.
-
11. A distributed data processing system having a plurality of nodes containing executable instructions, in memory locations within the nodes of said distributed data processing system, for causing nodes in said distributed data processing system to maintain at each one of a plurality of said nodes in said distributed data processing system, pair wise epoch number indications relative to other ones of said plurality of nodes in said distributed data processing system and for causing said nodes in said distributed data processing system to negotiate message transmission between pairs of said nodes in said distributed data processing system based on said pair wise epoch number indications.
Specification