Cluster-based system and method of recovery from server failures
First Claim
1. A system for recovering from the failure of a server in a computer network comprising one or more non-clustered servers, wherein each server possesses a network address, and one or more devices, wherein the devices are communicatively coupled to the network and associated with a LUN address and wherein each LUN address is owned by a server, comprising:
- a heartbeat mechanism operable to transmit a heartbeat signal to a server and receive a response from the server, whereby a failing server may be detected if a response to the heartbeat signal is not received;
LUN management software, wherein the LUN management software is operable to re-assign the LUN addresses owned by the failing server when the failing server is detected;
a cluster group comprising at least one cluster server, wherein the cluster server is operable to receive ownership of the LUN addresses owned by the failing server; and
wherein the cluster server is running cluster software operable for creating a recovery group that is associated with the network address associated with the failing server, and wherein the recovery group is operable to receive a user or computer network request that is directed to the associated network address of the failing server, such that the cluster server is operable to serve the user or run an application associated with the failing server.
14 Assignments
0 Petitions
Accused Products
Abstract
A system and method for recovering from a server failure in a computer network, wherein the network contains several stand-alone, non-clustered, servers, and a cluster, wherein a clustered server also serves as the spare server, is disclosed. This cluster will have one standby recovery group for each non-clustered server in the computer network. Each recovery group contains the IP address and network name of the associated stand-alone server. The cluster monitors the health of the stand-alone servers, preferably through the use of a heartbeat mechanism. If the cluster detects a failure, it will reassign the LUNs owned by the failing server to the cluster. After the cluster has reassigned the LUNs, it will activate the recovery group containing the IP address and network name of the failing server. Subsequently, the cluster will assume the identity of the failing server and serve its users, until the failing server is repaired or replaced.
382 Citations
30 Claims
-
1. A system for recovering from the failure of a server in a computer network comprising one or more non-clustered servers, wherein each server possesses a network address, and one or more devices, wherein the devices are communicatively coupled to the network and associated with a LUN address and wherein each LUN address is owned by a server, comprising:
-
a heartbeat mechanism operable to transmit a heartbeat signal to a server and receive a response from the server, whereby a failing server may be detected if a response to the heartbeat signal is not received;
LUN management software, wherein the LUN management software is operable to re-assign the LUN addresses owned by the failing server when the failing server is detected;
a cluster group comprising at least one cluster server, wherein the cluster server is operable to receive ownership of the LUN addresses owned by the failing server; and
wherein the cluster server is running cluster software operable for creating a recovery group that is associated with the network address associated with the failing server, and wherein the recovery group is operable to receive a user or computer network request that is directed to the associated network address of the failing server, such that the cluster server is operable to serve the user or run an application associated with the failing server. - View Dependent Claims (2, 3, 4, 5, 6, 7, 8, 9)
-
-
10. A method for recovering from the failure of a server in a computer network comprising one or more non-clustered servers, wherein each server possesses a network address, and one or more devices, wherein the devices are communicatively coupled to the network and associated with a LUN address and wherein each LUN address is owned by a server, comprising the steps of:
-
providing LUN management software, wherein the LUN management software is operable to re-assign the LUN addresses owned by the failing server when the failing server is detected;
providing a cluster group comprising at least one cluster server, wherein the cluster server is operable to receive ownership of the LUN addresses owned by the failing server; and
wherein the cluster server is running cluster software operable for creating a recovery group that is associated with the network address associated with the failing server, and wherein the recovery group is operable to receive a user or computer network request that is directed to the associated network address of the failing server, such that the cluster server is operable to assume the identity of the failing server and serve the user or run an application associated with the failing server;
monitoring the status of the non-clustered servers;
detecting a failing non-clustered server;
re-assigning the LUNs owned by the failing non-clustered server to the cluster server;
activating the recovery group associated with the failing non-clustered server; and
assuming the identity of the failing non-clustered server. - View Dependent Claims (11, 12, 13, 14, 15, 16, 17, 18, 19)
wherein the cluster group comprises a plurality of clustered servers and wherein the clustered servers are communicatively coupled to the heartbeat mechanism and the LUN management software such that if a failing clustered server fails to respond to a heartbeat signal, another clustered server may assume the network address and LUN addresses associated with the failing clustered server; - and
further comprising the steps of;
monitoring the status of the clustered servers;
detecting a failing clustered server;
re-assigning the LUNs owned by the failing cluster server to another non-failing cluster server;
activating the recovery group associated with the failing cluster server; and
assuming the identity of the failing cluster server.
-
-
13. The method of claim 11, wherein the computer network comprises a SAN storage network.
-
14. The method of claim 13, wherein the non-clustered servers and the cluster group are communicatively coupled to the SAN storage network.
-
15. The method of claim 14, wherein the devices comprise storage devices.
-
16. The method of claim 15, wherein the non-clustered servers and the cluster group are running storage consolidation software, wherein the storage consolidation software is operable to manage the access to the storage devices by the non-clustered servers and the cluster group.
-
17. The method of claim 16, wherein the storage consolidation software comprises the LUN management software.
-
18. The method of claim 16, wherein the storage consolidation software is operable to run the heartbeat mechanism.
-
19. The method of claim 10, wherein the LUN management software is operable to mask the LUN addresses owned by the failing server.
-
20. A computer network comprising:
-
one or more devices communicatively coupled to the computer network, wherein the devices are each associated with a LUN address, wherein each LUN address is owned by a server coupled to the computer network;
one or more non-clustered servers, wherein each non-clustered server possesses a network address;
a cluster group comprising at least one clustered server, wherein the cluster server is operable to receive ownership of the LUN addresses owned by a failing server; and
wherein the cluster server is running cluster software operable for creating a recovery group that is associated with the network address associated with the failing server, and wherein the recovery group is operable to receive a user or computer network request that is directed to the network address of the failing server, such that the cluster server is operable to serve the user or run an application with the failing server. - View Dependent Claims (21, 22, 23, 24, 25, 26, 27, 28, 29, 30)
wherein the cluster group comprises a plurality of clustered servers, wherein each clustered server is possesses a network address and may possess a LUN address, and wherein the clustered servers are communicatively coupled to the heartbeat mechanism and the LUN management software such that if a failing clustered server fails to respond to a heartbeat signal, another clustered server may assume the network address and LUN addresses associated with the failing clustered server. -
24. The computer network of claim 22, wherein the LUN management software is operable to mask the LUN address owned by the failing server.
-
25. The computer network of claim 20, wherein the computer network further comprises a SAN storage network.
-
26. The computer network of claim 25, wherein the non-clustered servers and the cluster group are communicatively coupled to the SAN storage network.
-
27. The computer network of claim 26, wherein the devices comprise storage devices.
-
28. The computer network of claim 27, wherein the non-clustered servers and the cluster group are running storage consolidation software, wherein the storage consolidation software is operable to manage the access to the storage devices by the non-clustered servers and the cluster group.
-
29. The computer network of claim 28, wherein the storage consolidation software comprises the LUN management software.
-
30. The computer network of claim 28, wherein the storage consolidation software is operable to run the heartbeat mechanism.
-
Specification