SYSTEM AND METHOD FOR HANDLING MULTI-NODE FAILURES IN A DISASTER RECOVERY CLUSTER
First Claim
Patent Images
1. A method comprising:
- determining that a candidate node is not available for a switchover operation;
identifying an alternate node for the switchover operation;
determining whether the identified alternate node is capable of handling a load from a plurality of other nodes;
in response to determining that the identified alternate node is capable of handling the local from the plurality of other nodes, performing a switchover operation to transfer ownership of one or more objects from the plurality of other nodes to the identified alternate node; and
recovering data from a non-volatile memory to the one or more objects; and
bringing online the one or more objects.
1 Assignment
0 Petitions
Accused Products
Abstract
A system and method for handling multi-node failures in a disaster recovery cluster is provided. In the event of an error condition, a switchover operation occurs from the failed nodes to one or more surviving nodes. Data stored in non-volatile random access memory is recovered by the surviving nodes to bring storage objects, e.g., disks, aggregates and/or volumes into a consistent state.
15 Citations
20 Claims
-
1. A method comprising:
-
determining that a candidate node is not available for a switchover operation; identifying an alternate node for the switchover operation; determining whether the identified alternate node is capable of handling a load from a plurality of other nodes; in response to determining that the identified alternate node is capable of handling the local from the plurality of other nodes, performing a switchover operation to transfer ownership of one or more objects from the plurality of other nodes to the identified alternate node; and recovering data from a non-volatile memory to the one or more objects; and bringing online the one or more objects. - View Dependent Claims (2, 3, 4, 5, 6, 7, 8, 9)
-
-
10. A system comprising:
-
a first high availability pair comprising of a first and a second node operatively interconnected by a first cluster interconnect, the first node associated with first data storage objects and the second node associated with second data storage objects; a second high availability pair comprising of a third and a fourth node operatively interconnected by a second cluster interconnect, the third node associated with third data storage objects and the fourth node associated with fourth data storage objects, the first and second high availability pairs organized as a disaster recovery group; wherein the first node is configured to perform a takeover operation of the second data storage objects in response to an error condition of the second node; and wherein the third node is configured to perform a switchover operation to manage the first data storage objects and the fourth node is configured to perform a switchover operation to manage the second data storage objects in response to a subsequent error condition affecting the first node. - View Dependent Claims (11, 12, 13, 14, 15, 16, 17, 18)
-
-
19. A computer readable medium, including program instructions executable on a processor, the computer readable medium comprising:
-
program instructions that determine that a candidate node is not available for a switchover operation; program instructions that identify an alternate node for the switchover operation; program instructions that determine whether the identified alternate node is capable of handling a load from a plurality of other nodes; in response to determining that the identified alternate node is capable of handling the local from the plurality of other nodes, program instructions that perform a switchover operation to transfer ownership of one or more objects from the plurality of other nodes to the identified alternate node; program instructions that recover data from a non-volatile memory to the one or more objects; and program instructions that bring online the one or more objects. - View Dependent Claims (20)
-
Specification