Methods and systems for distributed failure detection and recovery using leasing
First Claim
1. A method for recovering from failures in a distributed system that includes a client and a server, said method comprising the steps of:
- requesting by the client a lease from the server for using a resource managed by the server;
granting the lease by the server to the client for a period of time; and
detecting by the client a first failure when a request to renew the granted lease fails.
0 Assignments
0 Petitions
Accused Products
Abstract
A system for using a lease to detect a failure and to perform failure recovery is provided. In using this system, a client requests a lease from a server to utilize a resource managed by the server for a period of time. Responsive to the request, the server grants the lease, and the client continually requests renewal of the lease. If the client fails to renew the lease, the server detects that an error has occurred to the client. Similarly, if the server fails to respond to a renew request, the client detects that an error has occurred to the server. As part of the lease establishment, the client and server exchange failure-recovery routines that each invokes if the other experiences a failure.
179 Citations
50 Claims
-
1. A method for recovering from failures in a distributed system that includes a client and a server, said method comprising the steps of:
-
requesting by the client a lease from the server for using a resource managed by the server;
granting the lease by the server to the client for a period of time; and
detecting by the client a first failure when a request to renew the granted lease fails. - View Dependent Claims (2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14)
detecting by the server a second failure when the granted lease expires.
-
-
3. The method of claim 2, further comprising the step of:
recovering the client by rolling back the client and the resource to a prenegotiated state.
-
4. The method of claim 3, wherein the recovering step includes the step of:
invoking a method provided by the client to the server for recovering the client and the resource.
-
5. The method of claim 1, further comprising the step of:
recovering the server by rolling back the server and the resource to a prenegotiated state.
-
6. The method of claim 5, wherein the recovering step includes the step of:
invoking a method provided by the server to the client for recovering the server and the resource.
-
7. The method of claim 1, wherein the requesting step includes the step of:
sending to the server a request that includes a resource identifier and the period of time.
-
8. The method of claim 1, wherein the requesting step includes the step of:
sending to the server a request that includes a type of access to the resource requested by the client.
-
9. The method of claim 1, wherein the requesting step includes the step of:
sending to the server a request that includes a privilege level associated with the client.
-
10. The method of claim 1, wherein the requesting step includes the step of:
sending to the server an object that includes a method for recovering the client.
-
11. The method of claim 1, wherein the granting step includes the step of:
sending to the client an object that includes a method for recovering the server.
-
12. The method of claim 1, wherein the granting step includes the step of:
sending to the client an object that includes a method for renewing the lease.
-
13. The method of claim 1, wherein the granting step includes the step of:
sending to the client an object that includes a method for canceling the lease.
-
14. The method of claim 1, wherein the granting step includes the step of:
sending to the client an object that includes a method for determining the period of time of the lease.
-
15. A data processing system, comprising:
-
a memory including;
a client program containing a first code that requests a lease for using a resource and containing second code that detects a first failure when a request by the client program to extend the lease fails; and
a server program containing third code that manages the resource, containing fourth code that grants the lease to the client program for a period of time, and containing fifth code that detects a second failure when the lease expires; and
a processor for running the client program and the server program. - View Dependent Claims (16, 17, 18, 19, 20, 21, 22, 23)
a storage device including a first object generated by the client program when requesting the lease from the server program.
-
-
19. The data processing system of claim 18, wherein the first object includes a method for recovering the client program by the server program.
-
20. The data processing system of claim 15, further comprising:
a storage device including a second object generated by the server program when granting the lease to the client program.
-
21. The data processing system of claim 20, wherein the second object includes a method for recovering the server program by the client program.
-
22. The data processing system of claim 20, wherein the second object includes a method for extending the lease by the client program.
-
23. The data processing system of claim 20, wherein the second object includes a method for canceling the lease by the client program.
-
24. A computer-readable medium containing instructions for controlling a data processing system to perform a method, the data processing system including a client and a server, said method comprising the steps of:
-
requesting by the client a lease from the server for using a resource managed by the server;
generating by the server a lease object that grants the lease to the client;
detecting by the client a first failure when a request by the client to renew the lease fails; and
detecting by the server a second failure when the granted lease expires before receiving from the client a request to cancel the lease. - View Dependent Claims (25, 26, 27, 28)
recovering the server by invoking a recovery routine in the lease object by the client.
-
-
26. The computer-readable medium of claim 25, wherein said recovering step includes the step of:
recovering the server by rolling back the server and the resource by the client to a prenegotiated state.
-
27. The computer-readable medium of claim 24, wherein said method further comprises the step of:
recovering the client by invoking a recovery routine provided to the server by the client when the client requested the lease.
-
28. The computer-readable medium of claim 27, wherein said recovering step includes the step of:
recovering the client by rolling back the client and the resource by the server to a prenegotiated state.
-
29. An apparatus for recovering from failures in a distributed system, comprising:
-
a requesting means for requesting a lease for using a resource and for detecting a first failure when a request by the requesting means to renew the lease fails; and
a resource allocating means for granting the lease to the client program for a period of time and for detecting a second failure when the lease expires before receiving from the requesting means a request to cancel the lease.
-
-
30. A method for recovering from failures in a distributed system that includes a client and a server, said method comprising the steps of:
-
requesting by the client a lease from the server for using a resource managed by the server for a period of time;
receiving by the client from the server a first object that grants the lease to the client and includes a method for recovering the server;
sending by the client to the server a request to renew the granted lease; and
detecting by the client a first failure when the request to renew the granted lease fails. - View Dependent Claims (31, 32)
recovering the server by invoking the recovery method in the first object by the client.
-
-
32. The method of claim 30, wherein the requesting step comprises the step of:
sending to the server a second object that includes a recovery method for recovering the client when the server detects that the lease has expired before receiving from the client a request to cancel the lease.
-
33. A method for recovering from failures in a distributed system including a client and a server, said method comprising the steps of:
-
receiving by the server a request from the client for a lease for using a resource managed by the server;
granting the lease to the client for a period of time by sending to the client a first object that includes a method for recovering the server; and
detecting by the server a failure when the granted lease expires. - View Dependent Claims (34, 35, 36)
receiving a second object that includes a method for recovering the client by the server.
-
-
35. The method of claim 33, wherein the detecting step includes the step of:
detecting by the server a failure when the granted lease expires before receiving from the client a request to cancel the granted lease.
-
36. The method of claim 33, wherein the detecting step includes the step of:
detecting by the server a failure when the granted lease expires before receiving from the client a request to renew the granted lease.
-
37. A computer-readable memory device encoded with a data structure for recovering from failures in a distributed system including a client and a server, the data structure comprising:
-
an object including an identifier identifying a resource leased by the client from the server and a recovery method for recovering the server and the resource by the client when a request by the client to renew the lease fails. - View Dependent Claims (38, 39, 40)
a renewing method for renewing the granted lease when the client determines that the lease is near expiration.
-
-
39. The computer-readable memory device of claim 37, further comprising:
a canceling method for canceling the granted lease when the client completes use of the resource.
-
40. The computer-readable memory device of claim 37, further comprising:
a duration method for determining the period of time of the granted lease.
-
41. A computer-readable memory device encoded with a data structure for recovering from failures in a distributed system including a client and a server, the data structure comprising:
-
an identifier identifying a resource leased by the client from the server for a period of time; and
an object including a recovery method for recovering the client and the resource by the server when the lease expires. - View Dependent Claims (42, 43, 44, 45)
a type of access to the resource requested by the client.
-
-
45. The computer-readable memory device of claim 43, further comprising:
a privilege level associated with the client.
-
46. A method for recovering from failures in a distributed system that includes a client and a server, said method comprising the steps of:
-
exchanging code between the client and the server during a lease negotiation for using a resource managed by the server; and
invoking the code to perform system management. - View Dependent Claims (47, 48, 49, 50)
recovering the client by the server when the server detects a failure.
-
-
48. The method of claim 46, wherein the invoking step includes the step of:
recovering the client and the resource by the server when the server detects a failure.
-
49. The method of claim 46, wherein the invoking step includes the step of:
recovering the server by the client when the client detects a failure.
-
50. The method of claim 46, wherein the invoking step includes the step of:
recovering the server and the resource by the client when the client detects a failure.
Specification