Two-node high availability cluster storage solution using an intelligent initiator to avoid split brain syndrome
First Claim
1. A method for mitigating split brain syndrome in a storage cluster, the method comprising:
- maintaining state information related to a first storage node in a device specific module running on an initiator, the initiator issuing read or write input/output (I/O) commands to the first storage node, and receiving the state information related to the first storage node piggy-backed together with a response to the read or write I/O commands, the state information related to the first storage node comprising an epoch number, a status of synchronization and a status of being a primary node or a secondary node;
maintaining state information related to a second storage node in the device specific module running on the initiator, the initiator issuing read or write I/O commands to the second storage node, and receiving the state information related to the second storage node piggy-backed together with a response to the read or write I/O commands, the state information related to the second storage node comprising an epoch number, a status of synchronization and a status of being a primary node or a secondary node;
switching, by the device specific module, operation between the first storage node and the second storage node in response to a failed one of the first storage node and the second storage node;
reconciling, by the device specific module, state information upon recovery of the failed storage node using the state information in the device specific module, wherein reconciling state information comprises comparing the epoch number related to the first storage node and the epoch number related to the second storage node, wherein the reconciled state information within the first storage node and the second storage node prevents split brain conflicts by designating one of the first storage node and the second storage node as the primary node or the secondary node, respectively; and
demoting one of the first storage node or the second storage node from the primary node to the secondary node, wherein the demoted first or second storage node has a lower epoch number.
3 Assignments
0 Petitions
Accused Products
Abstract
Techniques for maintaining mirrored storage cluster data consistency on systems with two-node, highly available storage solutions can employ an initiator-side agent operable to prevent split-brain scenarios. Split brain syndrome can be avoided, information identifying changes of synchronization states can be maintained, and both graceful and ungraceful shutdowns (or failures) of either one of the nodes or of the intelligent initiator itself can be mitigated. Technology presented herein supports load balancing and hot failover/failback in systems that may feature redundant network connectivity. Moreover, a method is supported for communicating storage cluster status between the storage nodes and the initiator.
396 Citations
18 Claims
-
1. A method for mitigating split brain syndrome in a storage cluster, the method comprising:
-
maintaining state information related to a first storage node in a device specific module running on an initiator, the initiator issuing read or write input/output (I/O) commands to the first storage node, and receiving the state information related to the first storage node piggy-backed together with a response to the read or write I/O commands, the state information related to the first storage node comprising an epoch number, a status of synchronization and a status of being a primary node or a secondary node; maintaining state information related to a second storage node in the device specific module running on the initiator, the initiator issuing read or write I/O commands to the second storage node, and receiving the state information related to the second storage node piggy-backed together with a response to the read or write I/O commands, the state information related to the second storage node comprising an epoch number, a status of synchronization and a status of being a primary node or a secondary node; switching, by the device specific module, operation between the first storage node and the second storage node in response to a failed one of the first storage node and the second storage node; reconciling, by the device specific module, state information upon recovery of the failed storage node using the state information in the device specific module, wherein reconciling state information comprises comparing the epoch number related to the first storage node and the epoch number related to the second storage node, wherein the reconciled state information within the first storage node and the second storage node prevents split brain conflicts by designating one of the first storage node and the second storage node as the primary node or the secondary node, respectively; and demoting one of the first storage node or the second storage node from the primary node to the secondary node, wherein the demoted first or second storage node has a lower epoch number. - View Dependent Claims (2, 3, 4, 5)
-
-
6. A non-transitory computer storage medium having computer-executable instructions stored thereon which, when executed by a computer system, cause the computer system to:
-
issue read and write input/output (I/O) commands to read and write data to a first storage node; issue read and write I/O commands to read and write data to a second storage node; query state information from the first storage node and the second storage node, the state information comprising an epoch number, a status of synchronization and a status of being a primary storage node or a secondary storage node; receive the state information related to the first storage node and the second storage node piggy-backed together with a response to the read and write I/O commands; maintain queried state information in a device specific module at an initiator; failover between the first storage node and the second storage node in response to a failure by using the state information maintained by the device specific module; reconcile, at the device specific module, state information upon recovery from the failure wherein reconciling state information comprises comparing the epoch number related to the first storage node and the epoch number related to the second storage node; and demote one of the first storage node or the second storage node from the primary storage node to the secondary storage node upon recovery from the failure, wherein the demoted first or second storage node has a lower epoch number. - View Dependent Claims (7, 8, 9, 10, 11, 12)
-
-
13. A networked data storage system comprising:
-
a first storage node; a second storage node in a mirrored configuration with the first storage node; an initiator node configured to issue read and write input/output (I/O) commands to read and write data to the first storage node and the second storage node, the initiator node receiving a response to the read and write I/O commands together with state information from the first storage node and the second storage node, the state information comprising an epoch number, a status of synchronization and a status of being a primary node or a secondary node, respectively; and a device specific module, running on the initiator node, operable to mitigate a split brain scenario related to the first storage node and the second storage node in accordance with the state information, wherein the initiator node, using the device specific module, operates to reconcile state information upon recovery from a failure, wherein reconciling state information comprises comparing the epoch number related to the first storage node and the epoch number related to the second storage node, wherein the initiator node, using the device specific module, operates to mitigate the split brain scenario using state information associated with the first storage node and the second storage node gathered during input/output (I/O) operations, wherein the initiator node, using the device specific module, designates one of the first storage node and the second storage node as the primary node or the secondary node, respectively, and wherein the initiator node, using the device specific module, operates to demote one of the first storage node or the second storage node from the primary node to the secondary node, wherein the demoted first or second storage node has a lower epoch number. - View Dependent Claims (14, 15, 16, 17, 18)
-
Specification