System and program product to recover from node failure/recovery incidents in distributed systems in which notification does not occur
First Claim
Patent Images
1. A method for establishing communication in a distributed data processing system, said method comprising:
- maintaining at one communication endpoint of a plurality of communication endpoints in said distributed data processing system a pair wise epoch indication that is relative to another communication endpoint of said plurality of communication endpoints, wherein said pair wise epoch indication is specific to the pair of communication endpoints that include the one communication endpoint and the another communication endpoint, and wherein the pair wise epoch indication at the one communication endpoint comprises an epoch indication with respect to the another communication endpoint, the epoch indication representing a state of the one communication endpoint as it relates to the another communication endpoint;
negotiating communication between the pair of communication endpoints, wherein communication between the pair of communication endpoints proceeds responsive to the epoch indication at the one communication endpoint, which is with respect to the another communication endpoint, being at a same level as another epoch indication at the another communication endpoint, which is with respect to the one communication endpoint; and
wherein the one communication endpoint or the another communication endpoint is a source and the other of the one communication endpoint and the another communication endpoint is a destination, and wherein there is a state discrepancy between the source and the destination, and wherein said negotiating further comprises negotiating between said source and said destination to establish a new state at said destination consistent with the state at said source so as to permit said source to exit from a non-ready state with respect to said destination, wherein said negotiating to permit said source to exit from a non-ready state with respect to said destination comprises;
said source incrementing its epoch indication with respect to said destination and sending a ready request message to said destination;
said destination receiving said ready request message, incrementing its epoch indication with respect to said source and sending a ready acknowledgment message to said source;
said source receiving said ready acknowledgment message, setting its own ready state with respect to said destination and transmitting a message to said destination; and
said destination receiving said message with a matching epoch indication and setting its ready state with respect to said source.
0 Assignments
0 Petitions
Accused Products
Abstract
Epoch numbers are maintained in a pair wise fashion at a plurality of communication endpoints to provide communication consistency and recovery from a range of failure conditions including total or partial node failure and subsequent recovery. Once an epoch state inconsistency is recognized, negotiation procedures provide an effective mechanism to reestablish valid communication links without the need to employ global variables which inherently possess greater transmission and overhead requirements needed to maintain communications. Renegotiation of recognizably valid epoch numbers occurs on a pair wise basis.
-
Citations
15 Claims
-
1. A method for establishing communication in a distributed data processing system, said method comprising:
-
maintaining at one communication endpoint of a plurality of communication endpoints in said distributed data processing system a pair wise epoch indication that is relative to another communication endpoint of said plurality of communication endpoints, wherein said pair wise epoch indication is specific to the pair of communication endpoints that include the one communication endpoint and the another communication endpoint, and wherein the pair wise epoch indication at the one communication endpoint comprises an epoch indication with respect to the another communication endpoint, the epoch indication representing a state of the one communication endpoint as it relates to the another communication endpoint; negotiating communication between the pair of communication endpoints, wherein communication between the pair of communication endpoints proceeds responsive to the epoch indication at the one communication endpoint, which is with respect to the another communication endpoint, being at a same level as another epoch indication at the another communication endpoint, which is with respect to the one communication endpoint; and wherein the one communication endpoint or the another communication endpoint is a source and the other of the one communication endpoint and the another communication endpoint is a destination, and wherein there is a state discrepancy between the source and the destination, and wherein said negotiating further comprises negotiating between said source and said destination to establish a new state at said destination consistent with the state at said source so as to permit said source to exit from a non-ready state with respect to said destination, wherein said negotiating to permit said source to exit from a non-ready state with respect to said destination comprises; said source incrementing its epoch indication with respect to said destination and sending a ready request message to said destination; said destination receiving said ready request message, incrementing its epoch indication with respect to said source and sending a ready acknowledgment message to said source; said source receiving said ready acknowledgment message, setting its own ready state with respect to said destination and transmitting a message to said destination; and said destination receiving said message with a matching epoch indication and setting its ready state with respect to said source. - View Dependent Claims (2, 3, 4)
-
-
5. A computer program product for establishing communication in a distributed data processing system, the computer program product comprising:
a non-transitory storage medium readable by a processor and storing instructions for execution by the processor for performing a method comprising; maintaining at one communication endpoint of a plurality of communication endpoints in said distributed data processing system a pair wise epoch indication that is relative to another communication endpoint of said plurality of communication endpoints, wherein said pair wise epoch indication is specific to the pair of communication endpoints that include the one communication endpoint and the another communication endpoint, and wherein the pair wise epoch indication at the one communication endpoint comprises an epoch indication with respect to the another communication endpoint, the epoch indication representing a state of the one communication endpoint as it relates to the another communication endpoint; negotiating communication between the pair of communication endpoints, wherein communication between the pair of communication endpoints proceeds responsive to the epoch indication at the one communication endpoint, which is with respect to the another communication endpoint, being at a same level as another epoch indication at the another communication endpoint, which is with respect to the one communication endpoint; and wherein the one communication endpoint or the another communication endpoint is a source and the other of the one communication endpoint and the another communication endpoint is a destination and wherein there is a state discrepancy between the source and the destination, and wherein said negotiating further comprises negotiating between said source and said destination to establish a new state at said destination consistent with the state at said source so as to permit said source to exit from a non-ready state with respect to said destination, wherein said negotiating to permit said source to exit from a non-ready state with respect to said destination comprises; said source incrementing its epoch indication with respect to said destination and sending a ready request message to said destination; said destination receiving said ready request message, incrementing its epoch indication with respect to said source and sending a ready acknowledgment message to said source; said source receiving said ready acknowledgment message, setting its own ready state with respect to said destination and transmitting a message to said destination; and said destination receiving said message with a matching epoch indication and setting its ready state with respect to said source. - View Dependent Claims (6, 7)
-
8. A computer system for establishing communication in a distributed data processing system, the computer system comprising:
-
a memory; and a processor in communications with the memory, wherein the computer system is configured to perform a method, said method comprising; maintaining at one communication endpoint of a plurality of communication endpoints in said distributed data processing system a pair wise epoch indication that is relative to another communication endpoint of said plurality of communication endpoints, wherein said pair wise epoch indication is specific to the pair of communication endpoints that include the one communication endpoint and the another communication endpoint, and wherein the pair wise epoch indication at the one communication endpoint comprises an epoch indication with respect to the another communication endpoint, the epoch indication representing a state of the one communication endpoint as it relates to the another communication endpoint; negotiating communication between the pair of communication endpoints, wherein communication between the pair of communication endpoints proceeds responsive to the epoch indication at the one communication endpoint, which is with respect to the another communication endpoint, being at a same level as another epoch indication at the another communication endpoint, which is with respect to the one communication endpoint; and wherein the one communication endpoint or the another communication endpoint is a source and the other of the one communication endpoint and the another communication endpoint is a destination, and wherein there is a state discrepancy between the source and the destination, and wherein said negotiating further comprises negotiating between said source and said destination to establish a new state at said destination consistent with the state at said source so as to permit said source to exit from a non-ready state with respect to said destination, wherein said negotiating to permit said source to exit from a non-ready state with respect to said destination comprises; said source incrementing its epoch indication with respect to said destination and sending a ready request message to said destination; said destination receiving said ready request message, incrementing its epoch indication with respect to said source and sending a ready acknowledgment message to said source; said source receiving said ready acknowledgment message, setting its own ready state with respect to said destination and transmitting a message to said destination; and said destination receiving said message with a matching epoch indication and setting its ready state with respect to said source. - View Dependent Claims (9, 10, 11)
-
-
12. A computer system for establishing communication in a distributed data processing system, the computer system comprising:
-
a memory; and a processor in communications with the memory, wherein the computer system is configured to perform a method, said method comprising; determining the existence of a state discrepancy between a communication source and a communication destination through the use of a pair wise epoch indication for said source and said destination; determining from said pair wise epoch indication that said discrepancy indicates that a failure at said source has occurred; and negotiating between said source and said destination to establish a new state at said destination consistent with the state at said source so as to permit at least one of; said source to exit from a non-ready state with respect to said destination;
orsaid destination to exit from a non-ready state with respect to said source; wherein said negotiating to permit said source to exit from a non-ready state with respect to said destination comprises; receiving at said destination a message with a non-matching epoch indication, dropping said message and sending a not-ready acknowledgment message to said source along with the same epoch indication sent by said source; receiving at said source said not-ready acknowledgment message, setting said source into the non-ready state with respect to said destination, incrementing the received epoch indication and sending a ready request message to said destination; receiving at said destination said ready request message from said source, setting its epoch indication with respect to said source to the received value and sending a ready acknowledgment message to said source; receiving at said source said ready acknowledgment message, setting its ready state with respect to said destination and transmitting a message to said destination; and receiving said message at said destination with a matching epoch indication and setting its ready state with respect to said source. - View Dependent Claims (13)
-
-
14. A computer system for establishing communication in a distributed data processing system, the computer system comprising:
-
a memory; and a processor in communications with the memory, wherein the computer system is configured to perform a method, said method comprising; determining the existence of a state discrepancy between a communication source and a communication destination through the use of a pair wise epoch indication for said source and said destination; determining from said pair wise epoch indication that said discrepancy indicates that said source has undergone a system reset; and negotiating between said source and said destination to establish a new state at said destination consistent with the state at said source so as to permit said destination to exit from a non-ready state with respect to said source, wherein said negotiating comprises; said source setting its epoch number so as to indicate the system reset at said source and sending said epoch indication to said destination with a ready request message; receiving said epoch indication at said destination and maintaining a count of such received epoch indications until a threshold is reached; responsive to reaching said threshold, updating said epoch indication and sending a ready acknowledgment message to said source along with said updated epoch indication; receiving at said source said ready acknowledgment message and said updated epoch indication, setting its epoch indication with respect to said destination to said updated epoch indication, setting its ready state with respect to said destination and transmitting a message to said destination; and receiving said message at said destination with a matching epoch indication and setting its ready state with respect to said source.
-
-
15. A computer system for establishing communication in a distributed data processing system, the computer system comprising:
-
a memory; and a processor in communications with the memory, wherein the computer system is configured to perform a method, said method comprising; determining the existence of a state discrepancy between a communication source and a communication destination through the use of a pair wise epoch indication for said source and said destination; determining from said pair wise epoch indication that said discrepancy indicates that said destination has undergone a system reset; and negotiating between said source and said destination to establish a new state at said destination consistent with the state at said source so as to permit said destination to exit from a non-ready state with respect to said source, wherein said negotiating comprises; said destination setting its epoch indication so as to indicate the system reset at said destination and sending said epoch indication to said source with a not ready acknowledgment message; receiving at said source said epoch indication and said not ready acknowledgment message and maintaining a count of such received epoch indication until a threshold is reached; responsive to reaching said threshold, incrementing said epoch indication at said source and sending a ready request message to said destination along with said incremented epoch indication; receiving at said destination said ready request message and said incremented epoch indication, setting its epoch number with respect to said destination to said incremented epoch indication, and sending a ready acknowledgment message to said source; receiving at said source said ready acknowledgment message, setting said ready state with respect to said destination and transmitting a message to said destination with said incremented epoch indication; and receiving said message at said destination with a matching epoch indication and setting its ready state with respect to said source.
-
Specification