Method and apparatus for transparent server failover for highly available objects
First Claim
1. A method for providing transparent failover from a first server to a second server for active invocations to an object, the first server acting as a primary server for invocations to the object, the method comprising:
- winding up the active invocations to the object, including causing any active invocations to unresponsive nodes to unblock and complete;
selecting the second server as a new primary server for the object upon a failure of the first server;
reconfiguring the second server to act as the new primary server for the object;
automatically retrying the active invocations which are incomplete to the object on the second server;
wherein the object has a primary copy located within a first storage device associated with the first server, and a secondary copy located within a second storage device associated with the second server, wherein the first storage device is separate from the second storage device; and
updating the secondary copy on the second server when the primary copy is updated on the first server.
2 Assignments
0 Petitions
Accused Products
Abstract
One embodiment of the present invention provides a method and an apparatus that facilitates transparent failovers from a primary copy of an object on a first server to a secondary copy of the object on a second server when the first server fails, or otherwise becomes unresponsive. The method includes detecting the failure of the first server; selecting the second server; and reconfiguring the second server to act as a new primary server for the object. Additionally, the method includes transparently retrying uncompleted invocations to the object to the second server, without requiring explicit retry commands from a client application program. A variation on this embodiment further includes winding up active invocations to the object before reconfiguring the second server to act as the new primary server. This winding up process may include causing invocations to unresponsive nodes to unblock and complete. Another variation includes blocking new invocations to the object after detecting the failure of the first server, and unblocking these new invocations after reconfiguring the second server to act as the new primary server. Hence, the present invention can greatly simplify programming of client application programs for highly available systems. It also makes it possible to use a client application program written for a nonhighly available system in a highly available system.
233 Citations
25 Claims
-
1. A method for providing transparent failover from a first server to a second server for active invocations to an object, the first server acting as a primary server for invocations to the object, the method comprising:
-
winding up the active invocations to the object, including causing any active invocations to unresponsive nodes to unblock and complete;
selecting the second server as a new primary server for the object upon a failure of the first server;
reconfiguring the second server to act as the new primary server for the object;
automatically retrying the active invocations which are incomplete to the object on the second server;
wherein the object has a primary copy located within a first storage device associated with the first server, and a secondary copy located within a second storage device associated with the second server, wherein the first storage device is separate from the second storage device; and
updating the secondary copy on the second server when the primary copy is updated on the first server. - View Dependent Claims (2, 3, 4, 5, 6, 7)
blocking any new active invocations to the object after the failure of the first server; and
unblocking the new active invocations to the object after reconfiguring the second server.
-
-
4. The method of claim 1, further comprising detecting the failure of the first server.
-
5. The method of claim 4, wherein the operation of detecting the failure is carried out by a system manager that is distributed across at least two of, the first server, the second server and a plurality of additional computer systems, so that the system manager is tolerant of server failures.
-
6. The method of claim 1, further comprising notifying clients of the first server that the first server has failed.
-
7. The method of claim 1, wherein the object includes a group of objects.
-
8. A method for providing transparent failover from a first server to a second server for active invocations to an object, the first server acting as a primary server for invocations to the object, the method comprising:
-
detecting a failure of the first server;
blocking any new active invocations to the object after detecting the failure of the first server;
winding up the active invocations to the object, including causing any active invocations to unresponsive nodes to unblock and complete;
selecting the second server as a new primary server for the object;
reconfiguring the second server to act as the new primary server for the object;
unblocking the new active invocations to the object after reconfiguring the second server; and
automatically retrying the active invocations which are incomplete to the object on the second server;
wherein the object has a primary copy located within a first storage device associated with the first server, and a secondary copy located within a second storage device associated with the second server, wherein the first storage device is separate from the second storage device; and
updating the secondary copy on the second server when the primary copy is updated on the first server. - View Dependent Claims (9, 10, 11)
-
-
12. An apparatus that provides a transparent failover from a first server to a second server for active invocations to an object, the first server acting as a primary server for invocations to the object, the apparatus comprising:
-
the first server coupled to a network;
a first storage device associated with the first server;
the second server coupled to the network;
a second storage device associated with the second server;
a system manager residing on at least one node on the network, the system manager detecting a failure of the first server and selecting the second server to act as a new primary server for the object;
a reconfiguration mechanism in communication with the system manager that reconfigures the second server to act as the new primary server for the object;
a winding up mechanism that winds up active invocations to the object before reconfiguring the second server, including causing any active invocations to unresponsive nodes to unblock and complete;
a retry mechanism in communication with the second server that automatically retries the active invocations which are incomplete to the object to the second server after the second server has been reconfigured;
wherein the object has a primary copy located within the first storage device and a secondary copy located within the second storage device wherein the first storage device is separate from the second storage device; and
an updating mechanism in communication with the first server and the second server that is configured to update the secondary copy on the second server when the primary copy is updated on the first server. - View Dependent Claims (13, 14, 15, 16, 17, 18)
-
-
19. An apparatus that provides a transparent failover from a first server to a second server for active invocations to an object, the first server acting as a primary server for invocations to the object, the apparatus comprising:
-
the first server coupled to a network;
a first storage device associated with the first server;
the second server coupled to the network;
a second storage device associated with the second server;
a system manager residing on at least one node on the network that detects a failure of the first server and selects the second server to act as a primary server for the object;
a winding up mechanism that winds up active invocations to the object including causing any active invocations to unresponsive nodes to unblock and complete, before the second server is reconfigured to act as a new primary server for the object;
a blocking mechanism that blocks new invocations to the object when the failure of the first server is detected, and that unblocks new invocations to the object after the second server is reconfigured;
a reconfiguration mechanism in communication with the system manager that reconfigures the second server to act as the new primary server for the object;
a retry mechanism in communication with the second server that automatically retries uncompleted invocations to the object after the second server has been reconfigured;
wherein the object has a primary copy located within the first storage device and a secondary copy located within the second storage device, wherein the first storage device is separate from the second storage device; and
an updating mechanism in communication with the first server and the second server that is configured to update the secondary copy on the second server when the primary copy is updated on the first server. - View Dependent Claims (20, 21, 22, 23)
-
-
24. A program storage device storing instructions that when executed by a computer performs a method for providing transparent failover from a first server to a second server for active invocations to an object, the first server acting as a primary server for invocations to the object, the method comprising:
-
winding up the active invocations to the object, including causing any active invocations to unresponsive nodes to unblock and complete;
selecting the second server as a new primary server for the object upon a failure of the first server;
reconfiguring the second server to act as the new primary server for the object;
automatically retrying the active invocations which are incomplete to the object on the second server;
wherein the object has a primary copy located within a first storage device associated with the first server and a secondary copy located within a second storage device associated with the second server, wherein the first storage device is separate from the second storage device; and
updating the secondary copy on the second server when the primary copy is updated on the first server.
-
-
25. A computer instruction signal embodied in a carrier wave carrying instructions that when executed by a computer perform a method for providing transparent failover from a first server to a second server for active invocations to an object, the first server acting as a primary server for invocations to the object, the method comprising:
-
winding up the active invocations to the object, including causing any active invocations to unresponsive nodes to unblock and complete;
selecting the second server as a new primary server for the object upon a failure of the first server;
reconfiguring the second server to act as the new primary server for the object;
automatically retrying the active invocations which are incomplete to the object on the second server;
wherein the object has a primary copy located within a first storage device associated with the first server, and a secondary copy located within a second storage device associated with the second server, wherein the first storage device is separate from the second storage device; and
updating the secondary copy on the second server when the primary copy is updated on the first server.
-
Specification