Maintaining routing consistency within a rendezvous federation
First Claim
1. At a subject node in a federation infrastructure of a ring of nodes configured for bi-directional routing, the ring of nodes including at least the subject node and a monitor node, a method for monitoring the subject node for a failure, the method comprising:
- an act of the subject node generating a subject side time-to-live duration value for use in monitoring of the subject node;
an act of the subject node sending an establish request to the monitor node, the establish request indicative of the subject node requesting that the monitor node monitor the subject node, the establish request including at least the subject side time-to-live duration value;
an act of the subject node establishing an existing subject side time-to-die time based on the subject side time-to-live duration value and the time the establish request was sent, wherein the subject node clock reaching the existing subject side time-to-die time, prior to receiving an establish grant from the monitor node, is an indication of the subject node having to transition to a failure state;
an act of the subject node receiving an establish grant from the monitor node, the establish grant indicative of the monitor node monitoring the subject node;
an act of the subject node sending a renew request to the monitor node prior to the subject node clock reaching the existing subject side time-to-die time;
an act of the subject node receiving a renew grant from the monitor node subsequent to sending the renew request and prior to the subject node clock reaching the existing subject side time-to-die time, the renew grant message indicative of the monitor node continuing to monitor the subject node;
an act of the subject node transitioning to a previously calculated updated subject side time-to-die time in response to receiving the renew grant, wherein the subject node clock reaching the updated subject side time-to-die time, prior to receiving another renew grant from the monitor node, is an indication of the subject node having to transition to a failure state.
2 Assignments
0 Petitions
Accused Products
Abstract
The present invention extends to methods, systems, and computer program products for appropriately detecting node failures in a rendezvous federation. A monitor node monitors a subject node. The subject node intermittently renews a time-to-live duration value with the monitor node to indicate the monitor node that the subject node has not failed. In some embodiments, each node in a pair of nodes monitors the other nodes in the pair of nodes. Thus, each node is a subject node and a monitor node. In further embodiments, an arbitration facility arbitrates failure reports.
-
Citations
44 Claims
-
1. At a subject node in a federation infrastructure of a ring of nodes configured for bi-directional routing, the ring of nodes including at least the subject node and a monitor node, a method for monitoring the subject node for a failure, the method comprising:
-
an act of the subject node generating a subject side time-to-live duration value for use in monitoring of the subject node;
an act of the subject node sending an establish request to the monitor node, the establish request indicative of the subject node requesting that the monitor node monitor the subject node, the establish request including at least the subject side time-to-live duration value;
an act of the subject node establishing an existing subject side time-to-die time based on the subject side time-to-live duration value and the time the establish request was sent, wherein the subject node clock reaching the existing subject side time-to-die time, prior to receiving an establish grant from the monitor node, is an indication of the subject node having to transition to a failure state;
an act of the subject node receiving an establish grant from the monitor node, the establish grant indicative of the monitor node monitoring the subject node;
an act of the subject node sending a renew request to the monitor node prior to the subject node clock reaching the existing subject side time-to-die time;
an act of the subject node receiving a renew grant from the monitor node subsequent to sending the renew request and prior to the subject node clock reaching the existing subject side time-to-die time, the renew grant message indicative of the monitor node continuing to monitor the subject node;
an act of the subject node transitioning to a previously calculated updated subject side time-to-die time in response to receiving the renew grant, wherein the subject node clock reaching the updated subject side time-to-die time, prior to receiving another renew grant from the monitor node, is an indication of the subject node having to transition to a failure state. - View Dependent Claims (2, 3, 4, 5, 6)
-
-
7. At a monitor node in a federation infrastructure of a ring of nodes configured for bi-directional routing, the ring of nodes including at least the monitor node and a subject node, a method for monitoring the subject node for a suspected failure, the method comprising:
-
an act of the monitor node receiving an establish request from the subject node, the establish request indicative of the subject node requesting that the monitor node monitor the subject node, the establish request including at least a subject side time-to-live duration value, the subject side time-to-live duration value used to determine a subject side time-to-die time at the subject node, wherein the subject node clock reaching the subject side time-to-die time, prior to receiving an establish grant from the monitor node, is an indication of the subject node having to transition to a failure state;
an act of the monitor node deriving a monitor side time-to-live duration value from the subject side time-to-live duration value;
an act of the monitor node establishing a monitor side time-to-die time based on the monitor side time-to-live duration value and the time the establish request was received, the monitor node clock reaching the monitor side time-to-die time, prior to receiving a renew request from the subject node, being indicative of a suspected failure of the subject node;
an act of the monitor node sending an establish grant to the subject node to indicate to the subject node that the monitor node has agreed to monitor the subject node;
an act of the monitor node receiving a renew request from the subject node subsequent to sending the establish grant and prior to the monitor node clock reaching the monitor side time-to-die time, the renew request indicating that the subject node has not failed;
an act of the monitor node establishing an updated monitor side time-to-die time in response to and based at least on the time the renew request was received, wherein the monitor node clock reaching the updated monitor side time-to-die time, prior to receiving another renew request from the subject node, is an indication of a suspected failure of the subject node; and
an act of the monitor node sending a renew grant to the subject node to indicate to the subject node that the monitor node has agreed to continue monitoring the subject node. - View Dependent Claims (8, 9, 10, 11)
-
-
12. At a first node in a federation infrastructure of a ring of nodes configured for bi-directional routing, the ring of nodes including at least the first node and a second node, a method for the first node to monitor the second node and the second node to monitor the first node, the method comprising:
-
an act of the first node generating a first node subject side time-to-live duration value for use in monitoring of the first node;
an act of the first node sending an establish request to the second node, the establish request indicative of the first node requesting that the second node monitor the first node, the sent establish request including the first node subject side time-to-live duration value;
an act of the first node establishing a first node subject side time-to-die time based on the first node subject side time-to-live duration value and the time the sent establish request was sent, the first node clock reaching the first node subject side time-to-die time, prior to receiving an establish grant from the second node, being indicative of the first node having to transition to a failure state;
an act of the first node receiving an establish grant from the second node, the establish grant indicative of the second node monitoring the first node;
an act of the first node receiving an establish request from the second node, the received establish request indicative of the second node requesting that the first node monitor the second node, the received establish request including a second node subject side time-to-live duration value, the second node subject side time-to-live duration value used to determine a second node subject side time-to-die time at the second node, wherein the second node clock reaching the second side subject side time-to-die time, prior to receiving an establish grant from the first node, is an of the second node having to transition to a failure state;
an act of the first node deriving a first node monitor side time-to-live duration value from the second node subject side time-to-live duration value;
an act of the first node establishing a first node monitor side time-to-die time based on the first node monitor side time-to-live duration value and the time the received establish request was received, wherein the first node clock reaching the first node monitor side time-to-die time, prior to receiving a renew request from the second node, is an indication of a suspected failure of the second node; and
an act of the first node sending an establish grant to the second node to indicate to the second node that the first node has agreed to monitor the second node. - View Dependent Claims (13, 14, 15, 16, 17, 18, 19, 20, 21, 22)
-
-
23. The method as recited in 22, further comprising:
an act of the first node sending a report to an arbitration facility that the second node is suspected of failing. - View Dependent Claims (24, 25)
- 26. The method as recited in 25, wherein the act of the first node claiming control of one or more ring resources previously controlled by the second node comprises an act of the first node reclaiming responsibility at least for one or more ring identifiers between the first node and the second node subsequent to expiration of the time period indicated in the accept message.
-
28. The method as recited in 23, further comprising:
-
an act of the first node detecting failure to receive an accept message from the arbitration facility within a maximum response time interval; and
an act of the first node transitioning into a failure state. - View Dependent Claims (29)
-
-
30. The method as recited in 23, further comprising:
-
an act of the first node receiving a deny message from the arbitration facility; and
an act of the first node transitioning into a failure state.
-
-
31. The method as recited in 23, wherein the act of the first node sending a report to an arbitration facility that the second node is suspected of failing comprises an act of reporting to third node assigned to arbitrate for the node pair including the first and second nodes.
-
32. The method as recited in 23, wherein the act of the first node sending a report to an arbitration facility that the second node is suspected of failing comprises an act of reporting to an arbitration facility that has global knowledge of the ring of nodes.
-
33. The method as recited in 23, wherein the act of the first node sending a report to an arbitration facility that the second node is suspected of failing comprises an act of reporting to a seed node.
-
34. In a federation infrastructure including an arbitrator and a ring of nodes configured for bi-directional routing, the ring of nodes including at least the first node and a second node, a method for arbitrating a node failure report, the method comprising:
-
an act of receiving a report from the first node that the second node is suspected of failing;
an act of determining that no other node has suspected the first node of failing within a specified recovery time interval prior to receiving the report from the first node;
an act of sending an accept message to the first node within a maximum response time interval, the accept message including a failure time value indicative of a time period after which the second node is guaranteed to transition into a failure state; and
an act of recording in a list that the second node is to transition to a failure state. - View Dependent Claims (35, 36, 37, 38, 39, 40, 41, 42, 43, 44)
-
Specification