System and method for establishing bi-directional failover in a two node cluster
First Claim
Patent Images
1. A method for providing bi-directional failover for data replication services in a two node cluster, comprising:
- detecting a failure of one of the nodes; and
in response to detecting the failure, exiting a conventional quorum state and entering a high availability state, wherein in the high availability state a single node is designated as a stand alone node that is a full read/write replica for the data replication services of the cluster, thereby enabling management services reliant on updates for replicated data to function normally, and wherein the conventional quorum state requires a majority of the nodes to be healthy to have quorum and neither node is initially configured with an epsilon value.
3 Assignments
0 Petitions
Accused Products
Abstract
A system and method for permitting bi-directional failover in two node clusters utilizing quorum-based data replication. In response to detecting an error in its partner the surviving node establishes itself as the primary of the cluster and sets a first persistent state in its local unit. A temporary epsilon value for quorum voting purposes is then assigned to the surviving node, which causes it to be in quorum. A second persistent state is stored in the local unit and the surviving node comes online as a result of being in quorum.
-
Citations
31 Claims
-
1. A method for providing bi-directional failover for data replication services in a two node cluster, comprising:
-
detecting a failure of one of the nodes; and in response to detecting the failure, exiting a conventional quorum state and entering a high availability state, wherein in the high availability state a single node is designated as a stand alone node that is a full read/write replica for the data replication services of the cluster, thereby enabling management services reliant on updates for replicated data to function normally, and wherein the conventional quorum state requires a majority of the nodes to be healthy to have quorum and neither node is initially configured with an epsilon value. - View Dependent Claims (2, 3, 4, 5, 6, 7)
-
-
8. A method for providing a bi-directional failover in a cluster comprising a first node and a second node, comprising:
-
providing the first node and the second node configured in a conventional quorum state, wherein the conventional quorum state requires a majority of the nodes to be healthy to have quorum and neither node is initially configured with an epsilon value; detecting, by the first node, an error condition on the second node; setting, by the first node, a local cached activity lock identifying the first node as active in the cluster; setting a first persistent state in a local unit of the first node; assigning a temporary epsilon value to the first node, wherein the first node enters into quorum as a result of the temporary epsilon value; and setting a second persistent state in the local unit of the first node, wherein the second persistent states is a high availability state where the first node is designated as a stand alone node that is a full read/write replica of the cluster. - View Dependent Claims (9, 10, 11, 12, 13)
-
-
14. A computer readable medium for providing a bi-directional failover in a cluster comprising a first node and a second node, the computer readable medium including program instructions for performing the steps of:
-
providing the first node and the second node configured in a conventional quorum state, wherein the conventional quorum state requires a majority of the nodes to be healthy to have quorum and neither node is initially configured with an epsilon value; detecting, by the first node, an error condition on the second node; setting, by the first node, a local cached activity lock identifying the first node as a primary of the cluster; setting a first persistent state in a local unit of the first node; assigning a temporary epsilon value to the first node, wherein the first node enters into quorum as a result of the temporary epsilon value; and setting a second persistent state in the local unit of the first node, wherein the second persistent states is a high availability state where the first node is designated as a stand alone node that is a full read/write replica of the cluster. - View Dependent Claims (15)
-
-
16. A system for providing a bi-directional failover in a cluster comprising a first node and a second node, the system comprising:
a storage operating system executed by a processor on the first node and the storage operating system having a replicated database (RDB), the RDB comprising a quorum manager configured to assign a temporary epsilon value to the first node in response to detecting an error condition in the second node, the temporary epsilon causing the first node to be in quorum and to allow the second node to come online to form the cluster between the first node and the second node, wherein the RDB further comprises a recovery manager configured to set a lock in a data structure identifying the first node as the owner of an HA activity lock in the cluster and further configured to set a first persistent state value in a local unit of the first node. - View Dependent Claims (17)
-
18. A computer readable medium for providing bi-directional failover among nodes of a two node replicated data cluster, the computer readable medium including program instructions for performing the steps of:
-
detecting a failure of one of the nodes; and in response to detecting the failure, exiting a conventional quorum state and entering a high availability state, wherein in the high availability state a single node is designated as a full read/write replica within the cluster data replication service, the full read/write replica modifying configuration information relating to one or more replicated services provided by the replicated services cluster, and wherein the conventional quorum state requires a majority of the nodes to be healthy to have quorum and neither node is initially configured with an epsilon value.
-
-
19. A system to provide bi-directional failover for data replication services in a two node cluster, comprising:
-
in response to detecting a failure, a disk element module executed by a processor, the disk element module configured, to designate a first node of the two nodes as a stand alone node that is a full read/write replica for the data replication services of the cluster, thereby enabling management services reliant on updates for replicated data to function normally; and a quorum manager configured to assign a temporary epsilon value to the first node in response to detecting an error condition in a second node, the temporary epsilon causing the first node to be in quorum and to allow the second node to come online to form the cluster between the first node and the second node. - View Dependent Claims (20, 21, 22, 23, 24)
-
-
25. A method for providing bi-directional failover for data replication services in a two node cluster, comprising:
-
detecting a failure of one of the nodes; and in response to detecting the failure, exiting a conventional quorum state and entering a high availability state, wherein a single node is designated as a full read/write replica for the data replication services of the cluster by storing in a lock associated with the single node in a disk element, thereby enabling management services reliant on updates for replicated data to function normally, and wherein the conventional quorum state requires a majority of the nodes to be healthy to have quorum and neither node is initially configured with an epsilon value.
-
-
26. A method for providing a bi-directional failover in a cluster comprising a first node and a second node, comprising:
-
providing the first node and the second node configured in a conventional quorum state, wherein the conventional quorum state requires a majority of the nodes to be healthy to have quorum and neither node is initially configured with an epsilon value; detecting, by the first node, an error condition on the second node; setting, by the first node, a local cached activity lock identifying the first node as active in the cluster; setting a first persistent state in a local unit of the first node; assigning a temporary epsilon value to the first node, wherein the first node enters into quorum as a result of the temporary epsilon value; setting a second persistent state in the local unit of the first node, wherein the second persistent states is a high availability state where the first node is designated as a stand alone node that is a full read/write replica of the cluster; detecting, by the first node, the post-failure presence of the second node; performing a resynchronization routine between the first and second nodes; and removing the temporary epsilon value from the first node. - View Dependent Claims (27, 28, 29, 30, 31)
-
Specification