Maintaining routing consistency within a rendezvous federation

US 20080005624A1
Filed: 10/13/2006
Published: 01/03/2008
Est. Priority Date: 10/22/2004
Status: Active Grant

First Claim

Patent Images

1. At a subject node in a federation infrastructure of a ring of nodes configured for bi-directional routing, the ring of nodes including at least the subject node and a monitor node, a method for monitoring the subject node for a failure, the method comprising:

an act of the subject node generating a subject side time-to-live duration value for use in monitoring of the subject node;

an act of the subject node sending an establish request to the monitor node, the establish request indicative of the subject node requesting that the monitor node monitor the subject node, the establish request including at least the subject side time-to-live duration value;

an act of the subject node establishing an existing subject side time-to-die time based on the subject side time-to-live duration value and the time the establish request was sent, wherein the subject node clock reaching the existing subject side time-to-die time, prior to receiving an establish grant from the monitor node, is an indication of the subject node having to transition to a failure state;

an act of the subject node receiving an establish grant from the monitor node, the establish grant indicative of the monitor node monitoring the subject node;

an act of the subject node sending a renew request to the monitor node prior to the subject node clock reaching the existing subject side time-to-die time;

an act of the subject node receiving a renew grant from the monitor node subsequent to sending the renew request and prior to the subject node clock reaching the existing subject side time-to-die time, the renew grant message indicative of the monitor node continuing to monitor the subject node;

an act of the subject node transitioning to a previously calculated updated subject side time-to-die time in response to receiving the renew grant, wherein the subject node clock reaching the updated subject side time-to-die time, prior to receiving another renew grant from the monitor node, is an indication of the subject node having to transition to a failure state.

View all claims

2 Assignments

Timeline View

Assignment View

0 Petitions

Accused Products

Abstract

The present invention extends to methods, systems, and computer program products for appropriately detecting node failures in a rendezvous federation. A monitor node monitors a subject node. The subject node intermittently renews a time-to-live duration value with the monitor node to indicate the monitor node that the subject node has not failed. In some embodiments, each node in a pair of nodes monitors the other nodes in the pair of nodes. Thus, each node is a subject node and a monitor node. In further embodiments, an arbitration facility arbitrates failure reports.

Citations

44 Claims

1. At a subject node in a federation infrastructure of a ring of nodes configured for bi-directional routing, the ring of nodes including at least the subject node and a monitor node, a method for monitoring the subject node for a failure, the method comprising:
- an act of the subject node generating a subject side time-to-live duration value for use in monitoring of the subject node;
  
  an act of the subject node sending an establish request to the monitor node, the establish request indicative of the subject node requesting that the monitor node monitor the subject node, the establish request including at least the subject side time-to-live duration value;
  
  an act of the subject node establishing an existing subject side time-to-die time based on the subject side time-to-live duration value and the time the establish request was sent, wherein the subject node clock reaching the existing subject side time-to-die time, prior to receiving an establish grant from the monitor node, is an indication of the subject node having to transition to a failure state;
  
  an act of the subject node receiving an establish grant from the monitor node, the establish grant indicative of the monitor node monitoring the subject node;
  
  an act of the subject node sending a renew request to the monitor node prior to the subject node clock reaching the existing subject side time-to-die time;
  
  an act of the subject node receiving a renew grant from the monitor node subsequent to sending the renew request and prior to the subject node clock reaching the existing subject side time-to-die time, the renew grant message indicative of the monitor node continuing to monitor the subject node;
  
  an act of the subject node transitioning to a previously calculated updated subject side time-to-die time in response to receiving the renew grant, wherein the subject node clock reaching the updated subject side time-to-die time, prior to receiving another renew grant from the monitor node, is an indication of the subject node having to transition to a failure state.
- View Dependent Claims (2, 3, 4, 5, 6)
- - 2. The method as recited in claim 1, further comprising:
    - an act of calculating the updated subject side time-to-die time based on the time the renew request was sent and the subject side TTL duration value prior to receiving the renew request.
  - 3. The method as recited in claim 1, further comprising:
    - an act of the subject node clock reaching the updated subject side time-to-die time prior to receiving another renew grant from the monitor node.
  - 4. The method as recited in claim 3, wherein the act of the subject node clock reaching the updated subject side time-to-die time prior to receiving a corresponding another renew grant form the monitor node comprises an act of the subject node malfunctioning such that the subject node is preventing from receiving a corresponding another renew request prior to the subject node clock reaching the updated subject side time-to-die time.
  - 5. The method as recited in claim 4, further comprising:
    - an act of the subject node transitioning to a failure state subsequent to the subject node clock reaching the updated subject side time-to-die time.
  - 6. The method as recited in claim 1, further comprising:
    - an act the subject node sending a second renew request to the monitor node subsequent to receiving the renew grant and prior to the subject node clock reaching the updated subject side time-to-die time;
      
      an act of subject node calculating what a new updated subject side time-to-die time is to be if a corresponding second renew grant responsive to the second renew request is received prior to expiration of the updated subject side time-to-die time; and
      
      an act of the subject node clock reaching the updated subject side time-to-die time prior to receiving a corresponding second renew grant responsive to the second renew request.

7. At a monitor node in a federation infrastructure of a ring of nodes configured for bi-directional routing, the ring of nodes including at least the monitor node and a subject node, a method for monitoring the subject node for a suspected failure, the method comprising:
- an act of the monitor node receiving an establish request from the subject node, the establish request indicative of the subject node requesting that the monitor node monitor the subject node, the establish request including at least a subject side time-to-live duration value, the subject side time-to-live duration value used to determine a subject side time-to-die time at the subject node, wherein the subject node clock reaching the subject side time-to-die time, prior to receiving an establish grant from the monitor node, is an indication of the subject node having to transition to a failure state;
  
  an act of the monitor node deriving a monitor side time-to-live duration value from the subject side time-to-live duration value;
  
  an act of the monitor node establishing a monitor side time-to-die time based on the monitor side time-to-live duration value and the time the establish request was received, the monitor node clock reaching the monitor side time-to-die time, prior to receiving a renew request from the subject node, being indicative of a suspected failure of the subject node;
  
  an act of the monitor node sending an establish grant to the subject node to indicate to the subject node that the monitor node has agreed to monitor the subject node;
  
  an act of the monitor node receiving a renew request from the subject node subsequent to sending the establish grant and prior to the monitor node clock reaching the monitor side time-to-die time, the renew request indicating that the subject node has not failed;
  
  an act of the monitor node establishing an updated monitor side time-to-die time in response to and based at least on the time the renew request was received, wherein the monitor node clock reaching the updated monitor side time-to-die time, prior to receiving another renew request from the subject node, is an indication of a suspected failure of the subject node; and
  
  an act of the monitor node sending a renew grant to the subject node to indicate to the subject node that the monitor node has agreed to continue monitoring the subject node.
- View Dependent Claims (8, 9, 10, 11)
- - 8. The method as recited in claim 7, further comprising:
    - an act of the monitor node clock reaching the updated monitor side time-to-die time prior to receiving another renew request from the subject node.
  - 9. The method as recited in claim 8, further comprising:
    - an act of the monitoring node transitioning to a timeout state.
  - 10. The method as recited in claim 7, wherein the act of the monitor node receiving a renew request from the subject node comprises an act of the monitor node receiving a renew request that includes a new subject side TTL duration value.
  - 11. The method as recited in claim 10, wherein the act of the monitor node establishing an updated monitor side time-to-die time comprises an act of the monitor node establishing an updated monitor side time-to-die time based on the new subject side TTL duration value.

12. At a first node in a federation infrastructure of a ring of nodes configured for bi-directional routing, the ring of nodes including at least the first node and a second node, a method for the first node to monitor the second node and the second node to monitor the first node, the method comprising:
- an act of the first node generating a first node subject side time-to-live duration value for use in monitoring of the first node;
  
  an act of the first node sending an establish request to the second node, the establish request indicative of the first node requesting that the second node monitor the first node, the sent establish request including the first node subject side time-to-live duration value;
  
  an act of the first node establishing a first node subject side time-to-die time based on the first node subject side time-to-live duration value and the time the sent establish request was sent, the first node clock reaching the first node subject side time-to-die time, prior to receiving an establish grant from the second node, being indicative of the first node having to transition to a failure state;
  
  an act of the first node receiving an establish grant from the second node, the establish grant indicative of the second node monitoring the first node;
  
  an act of the first node receiving an establish request from the second node, the received establish request indicative of the second node requesting that the first node monitor the second node, the received establish request including a second node subject side time-to-live duration value, the second node subject side time-to-live duration value used to determine a second node subject side time-to-die time at the second node, wherein the second node clock reaching the second side subject side time-to-die time, prior to receiving an establish grant from the first node, is an of the second node having to transition to a failure state;
  
  an act of the first node deriving a first node monitor side time-to-live duration value from the second node subject side time-to-live duration value;
  
  an act of the first node establishing a first node monitor side time-to-die time based on the first node monitor side time-to-live duration value and the time the received establish request was received, wherein the first node clock reaching the first node monitor side time-to-die time, prior to receiving a renew request from the second node, is an indication of a suspected failure of the second node; and
  
  an act of the first node sending an establish grant to the second node to indicate to the second node that the first node has agreed to monitor the second node.
- View Dependent Claims (13, 14, 15, 16, 17, 18, 19, 20, 21, 22)
- - 13. The method as recited in claim 12, further comprising:
    - an act of the first node sending a renew request to the second node prior to the first node clock reaching the first node subject side time-to-die time;
      
      an act of the first node calculating what an updated first node subject side time-to-die time is to be if a corresponding renew grant responsive to the sent renew request is received, the calculation based at least on the time the sent renew request was sent and the first node subject side TTL duration value;
      
      an act of the first node receiving a renew grant from the second node subsequent to sending the sent renew request and prior to the first node clock reaching the first node subject side time-to-die time, the renew grant indicative of the second node continuing to monitor the subject node;
      
      an act of the first node transitioning to the updated first node subject side time-to-die time in response to receiving the renew grant, wherein the first node clock reaching the updated first node subject side time-to-die time, prior to receiving another renew grant from the second node, is an indication of the first node having to transition to a failure state.
  - 14. The method as recited in claim 12, further comprising:
    - an act of the first node clock reaching the first node subject side time-to-die time prior to receiving an renew grant from the second node; and
      
      an act of the first node transitioning to a failure state in response to the first node clock reaching the first node subject side time-to-die time.
  - 15. The method as recited in claim 14, wherein the act of the first node clock reaching the first node subject side time-to-die time prior to receiving renew grant from the second node, comprises:
    - an act the first node sending a renew request to the second node subsequent to receiving the establish grant and prior to the first node clock reaching the first node subject side time-to-die time; and
      
      an act of the first node clock reaching the first node subject side time-to-die time prior to receiving a corresponding renew grant responsive to the sent renew request.
  - 16. The method as recited in claim 15, wherein the act of the first node clock reaching the first node subject side time-to-die time prior to receiving a renew grant from the second node comprises an act of the first node malfunctioning such that the first node is preventing from sending a renew request to the second node prior to the first node clock reaching the first node subject side time-to-die time.
  - 17. The method as recited in claim 14, further comprising:
    - an act of the first node transitioning to a failure state in response to the first node clock reaching the first node subject side time-to-die time.
  - 18. The method as recited in claim 17, further comprising:
    - an act of the first node receiving a renew request from the second node prior the first node clock reaching the first node monitor side time-to-die time and subsequent to transitioning to a failure state, the received renew request indicating that the second node has not failed; and
      
      an act of the first node denying the renew request.
  - 19. The method as recited in claim 12, further comprising:
    - an act of the first node receiving a renew request from the second node subsequent to sending the sent establish grant and prior to the first node clock reaching the first node monitor side time-to-die time, the received renew request indicating that the second node has not failed;
      
      an act of the first node granting the renew request to the second node;
      
      an act of the first node establishing an updated first node monitor side time-to-die time in response to and based at least on the time the received renew request was received, the first node clock reaching the updated first node monitor side time-to-die time, prior to receiving another renew request from the second node, being indicative of a suspected failure of the second node; and
      
      an act of the first node sending a renew grant to the second node to indicate to the second node that the first node has agreed to continue monitoring the second node.
  - 20. The method as recited in claim 12, further comprising:
    - an act of the first node clock reaching the first node monitor side time-to-die time prior to receiving a renew request from the second node.
  - 21. The method as recited in claim 20, further comprising:
    - an act of the first node transitioning to timeout state in response to the first node clock reaching the first node monitor side time-to-die time.
  - 22. The method as recited in claim 14 or 21, further comprising:
    - an act of the first node transitioning to a report state.

23. The method as recited in 22, further comprising:
- an act of the first node sending a report to an arbitration facility that the second node is suspected of failing.
- View Dependent Claims (24, 25)
- - 24. The method as recited in claim 23, further comprising:
    - an act of the first node receiving an accept message from the arbitration facility within a maximum response time interval, the accept message including a time value indicative of a time period after which the second node is guaranteed to transition into a failure state.
  - 25. The method as recited in claim 24, further comprising:
    - an act of the first node claiming control of one or more ring resources previously controlled by the second node subsequent to receiving the accept message.

26. The method as recited in 25, wherein the act of the first node claiming control of one or more ring resources previously controlled by the second node comprises an act of the first node reclaiming responsibility at least for one or more ring identifiers between the first node and the second node subsequent to expiration of the time period indicated in the accept message.
- View Dependent Claims (27)
- - 27. The method as recited in claim 26, wherein the act the first node reclaiming responsibility at least for one or more ring identifiers between the first node and the second node comprises an act of the first node claim responsibility for the ring identifier of the second node.

28. The method as recited in 23, further comprising:
- an act of the first node detecting failure to receive an accept message from the arbitration facility within a maximum response time interval; and
  
  an act of the first node transitioning into a failure state.
- View Dependent Claims (29)
- - 29. The method as recited in claim 28, wherein the first node detecting failure to receive an accept message from the arbitration facility comprises an act of detecting expiration of a recovery time interval prior to receiving an accept message.

30. The method as recited in 23, further comprising:
- an act of the first node receiving a deny message from the arbitration facility; and
  
  an act of the first node transitioning into a failure state.

31. The method as recited in 23, wherein the act of the first node sending a report to an arbitration facility that the second node is suspected of failing comprises an act of reporting to third node assigned to arbitrate for the node pair including the first and second nodes.

32. The method as recited in 23, wherein the act of the first node sending a report to an arbitration facility that the second node is suspected of failing comprises an act of reporting to an arbitration facility that has global knowledge of the ring of nodes.

33. The method as recited in 23, wherein the act of the first node sending a report to an arbitration facility that the second node is suspected of failing comprises an act of reporting to a seed node.

34. In a federation infrastructure including an arbitrator and a ring of nodes configured for bi-directional routing, the ring of nodes including at least the first node and a second node, a method for arbitrating a node failure report, the method comprising:
- an act of receiving a report from the first node that the second node is suspected of failing;
  
  an act of determining that no other node has suspected the first node of failing within a specified recovery time interval prior to receiving the report from the first node;
  
  an act of sending an accept message to the first node within a maximum response time interval, the accept message including a failure time value indicative of a time period after which the second node is guaranteed to transition into a failure state; and
  
  an act of recording in a list that the second node is to transition to a failure state.
- View Dependent Claims (35, 36, 37, 38, 39, 40, 41, 42, 43, 44)
- - 35. The method as recited in claim 34, further comprising:
    - an act of receiving a report from the second node that the first node is suspected of failing, the report from the second node received within the specified recovery time interval subsequent to receiving the report from the first node;
      
      an act of referring to the list to determine that the second node is to transition to a failure state; and
      
      an act of sending a deny message to the second node to cause the second to transition into a failure state.
  - 36. The method as recited in claim 34, wherein the act of sending an accept message to the first node comprises an act of granting the first node permission to control one or more resources previously controlled by the second node.
  - 37. The method as recited in claim 36, wherein the act of granting the first node permission to control one or more resources previously controlled by the second node comprises an act of granting the first node responsibility for at least one or more ring identifiers between the first node and the second node on the ring of nodes.
  - 38. The method as recited in claim 36, wherein the act of granting the first node responsibility for at least one or more ring identifiers between the first node and the second node on the ring of nodes comprises an act of granting the first node responsibility for the identifier of the second node.
  - 39. The method as recited in claim 34, wherein the act of recording in a list that the second node is in a failure state comprises an act of adding the second node to a failed node list.
  - 40. The method as recited in claim 34, wherein the act of recording that the second node is in a failure state comprises an act of updating an entry for the second node within a failed node list to indicate the time the first node reported the second node as a failed node.
  - 41. The method as recited in claim 34, further comprising:
    - an act of detecting that a period of time equal to the specified recovery time interval has elapsed without receiving any further reports suspecting the second node of failing; and
      
      an act of removing the second node from the list.
  - 42. The method as recited in claim 34, wherein the act of receiving a report from the first node that the second node is suspected of failing comprises an act of an arbitration facility with global knowledge of the ring of nodes receiving a report from the first node.
  - 43. The method as recited in claim 34, wherein the act of receiving a report from the first node that the second node is suspected of failing comprises an act of an a third node configured to arbitrate for a node pair including the first and second nodes receiving a report from the first node.
  - 44. The method as recited in claim 34, wherein the act of receiving a report from the first node that the second node is suspected of failing comprises an act of a node in an arbitration ring of nodes receiving a report from the first node.

Specification

Resources

Litigation Campaign Assessment

Current Assignee
Microsoft Technology Licensing LLC (Microsoft Corporation)
Original Assignee
Microsoft Corporation
Inventors
Kakivaya, Gopala, Malkhi, Dahlia, Xun, Lu, Hasha, Richard

Granted Patent

US 7,694,167 B2
Time in Patent Office

Days
Field of Search
US Class Current

714/47
CPC Class Codes

H04L 67/104   Peer-to-peer [P2P] networks

H04L 67/1046   Joining mechanisms

H04L 67/1048   Departure or maintenance me...

H04L 67/1076   Resource dissemination mech...

Maintaining routing consistency within a rendezvous federation

First Claim

2 Assignments

0 Petitions

Accused Products

Abstract

Citations

44 Claims

Specification

Solutions

Use Cases

Quick Links

Maintaining routing consistency within a rendezvous federation

First Claim

2 Assignments

Subscription Required

Subscription Required

0 Petitions

Subscription Required

Accused Products

Subscription Required

Abstract

Citations

44 Claims

Specification

Subscription Required

Solutions

Use Cases

Quick Links