Method and system for consistent cluster operational data in a server cluster using a quorum of replicas
First Claim
1. A system for providing consistent operational data of a previous server cluster to a new server duster, comprising, a plurality of nodes, a plurality of replica members, each of the replica members maintaining an epoch number indicative of a state of the cluster operational data, at least one replica member having updated cluster operational data stored thereon by a first node including information indicative of a quorum requirement of a number of replica members needed to form a cluster, and a duster service on a second node configured to 1) obtain control of a replica set of a number of replica members, 2) compare the number of replica members in the replica set with the quorum requirement, 3) form the new server cluster if the quorum requirement is met by the number of replica members in the replica set, and 4) determine which of the replica members of the replica set has data that is most updated.
2 Assignments
0 Petitions
Accused Products
Abstract
A method and system for increasing server cluster availability by requiring at a minimum only one node and a quorum replica set of replica members to form and operate a cluster. Replica members, independent from the nodes, maintain cluster operational data. A cluster operates when one node possesses a majority of replica members, which ensures that any new or surviving cluster includes consistent cluster operational data via at least one replica member from the immediately prior cluster. Arbitration provides exclusive ownership by one node of the replica members, including at cluster formation, and when the owning node fails. Arbitration uses a fast mutual exclusion algorithm and a reservation mechanism to challenge for and defend the exclusive reservation of each member. A quorum replica set algorithm brings members online and offline with data consistency, including updating unreconciled replica members, and ensures consistent read and update operations.
157 Citations
25 Claims
- 1. A system for providing consistent operational data of a previous server cluster to a new server duster, comprising, a plurality of nodes, a plurality of replica members, each of the replica members maintaining an epoch number indicative of a state of the cluster operational data, at least one replica member having updated cluster operational data stored thereon by a first node including information indicative of a quorum requirement of a number of replica members needed to form a cluster, and a duster service on a second node configured to 1) obtain control of a replica set of a number of replica members, 2) compare the number of replica members in the replica set with the quorum requirement, 3) form the new server cluster if the quorum requirement is met by the number of replica members in the replica set, and 4) determine which of the replica members of the replica set has data that is most updated.
-
9. A computer-implemented method, comprising:
-
maintaining cluster operational date on a replica set comprising a plurality of replica members that are each independent of any node of a server cluster;
representing the cluster at a node if the number of replica members controlled by the node comprises at least a majority of the total number of replica members configured to operate in the cluster; and
determining which of the replica members of the replica set has operational data that is most updated, including maintaining an epoch number in association with each replica member, and replicating at least some of that operational data to the other replica members of the replica set. - View Dependent Claims (10, 11)
-
-
12. A computer-implemented method, comprising:
-
maintaining cluster operational data on a replica set comprising a plurality of replica members that are each independent of any node of a server cluster;
representing the cluster at a node if the number of replica members controlled by the node comprises at least a majority of the total number of replica members configured to operate in the cluster;
determining which of the replica members of the replica set has operational data that is most updated, and replicating at least some of that operational data to the other replica members of the replica set; and
evaluating a last record logged on a replica member to which data is being replicated, against at least one record of the replicated data, to determine whether to discard the last record. - View Dependent Claims (13)
-
-
14. A computer-implemented method, comprising:
-
maintaining cluster operational data on a replica set comprising a plurality of replica members that are each independent of any node of a server cluster;
representing the cluster at a node if the number of replica members controlled by the node comprises at least a majority of the total number of replica members configured to operate in the cluster;
determining which of the replica members of the replica set has operational data that is most updated, and replicating at least some of that operational data to the other replica members of the replica set; and
detecting the unavailability of a replica member that was operational, determining whether the majority of replica members still exists, and if not, halting updates to the cluster configuration data. - View Dependent Claims (15)
-
-
16. A computer-implemented method, comprising:
-
maintaining cluster operational data on a replica set comprising a plurality of replica members that are each independent of any node of a server cluster;
representing the cluster at a node if the number of replica members controlled by the node comprises at least a majority of the total number of replica members configured to operate in the cluster, wherein the node controls the majority of replica members by arbitrating for exclusive ownership of each member, including, issuing a reset command, delaying for a period of time, and issuing a reserve command; and
determining which of the replica members of the replica set has operational data that is most updated, and replicating at least some of that operational data to the other replica members of the replica set.
-
-
17. A computer-implemented method, comprising:
-
maintaining cluster operational data on a replica set comprising a plurality of replica members that are each independent of any node of a server cluster;
representing the cluster at a node if the number of replica members controlled by the node comprises at least a majority of the total number of replica members configured to operate in the cluster, wherein the node controls the majority of replica members by arbitrating for exclusive ownership of each member, including, issuing a reset command; and
determining which of the replica members of the replica set has operational data that is most updated, and replicating at least some of that operational data to the other replica members of the replica set.
-
-
18. A computer-implemented method of operating a server cluster of at least three nodes, comprising:
-
storing cluster operational data on a replica set of at least one replica member, each replica member being independent from any node;
at a first node, arbitrating with at least two other nodes for control of the replica set, the arbitration being performed for each replica member and comprising, attempting to obtain a right to exclusively reserve that replica member wherein attempting to obtain a right to exclusively reserve that replica member includes, attempting to write a unique identifier to a location on the replica member, delaying, and reading from the location to determine whether the unique identifier is unchanged, and if the attempt is successful, exclusively reserving that replica member; and
representing the cluster at the first node if the replica set is controlled thereby and has consistent cluster operational data with respect to a previous cluster.
-
-
19. A computer-implemented method of operating a server cluster of at least three nodes, comprising:
-
storing cluster operational data on a replica set of at least one replica member, each replica member being independent from any node;
at a first node, arbitrating with at least two other nodes for control of the replica set, the arbitration being performed for each replica member and comprising, attempting to obtain a right to exclusively reserve that replica member, and if the attempt is successful, exclusively reserving that replica member, wherein arbitrating for each replica member includes, issuing a reset command for the replica member, delaying for a period of time, and issuing a reserve command for the replica member; and
representing the cluster at the first node if the replica set is controlled thereby and has consistent cluster operational data with respect to a previous cluster.
-
-
20. A computer-readable medium having computer-executable instructions, comprising:
-
representing a cluster by obtaining exclusive control of a majority of replica members in an available set thereof;
detecting a status change of one replica member with respect to the available set; and
taking action in response to the changed status to ensure that the replica members are consistent with respect to any update logged thereto, wherein taking action in response to the changed status comprises running a recovery process to make the replica members consistent including increasing an epoch number maintained on each available replica member.
-
-
21. A computer-readable medium having computer-executable instructions, comprising:
-
representing a cluster by obtaining exclusive control of a majority of replica members in an available set thereof;
detecting a status change of one replica member with respect to the available set; and
taking action in response to the changed status to ensure that the replica members are consistent with respect to any update logged thereto, wherein taking action in response to the changed status comprises running a recovery process to make the replica members consistent including looking for a non-committed update that was not committed before a subsequent committed update on at least one available replica member, and discarding each such non-committed update found.
-
-
22. A computer-readable medium having computer-executable instructions, comprising:
-
representing a cluster by obtaining exclusive control of a majority of replica members in an available set thereof, wherein a majority of replica members does not still exist;
detecting a status change of one replica member with respect to the available set; and
taking action in response to the changed status to ensure that the replica members are consistent with respect to any update logged thereto, wherein taking action in response to the changed status further includes preventing updates from being written to replica members that remain available.
-
-
23. A computer-readable medium having computer-executable instructions, comprising:
-
representing a cluster by obtaining exclusive control of a majority of replica members in an available set thereof;
detecting a status change of one replica member with respect to the available set, wherein detecting a status change includes attempting to write an update to each available replica member, receiving success or failure information for each attempted write, and determining whether a majority of replica members still exists by evaluating a number of successful writes against a number required for a majority; and
taking action in response to the changed status to ensure that the replica members are consistent with respect to any update logged thereto.
-
-
24. A computer-readable medium having computer-executable instructions, comprising:
-
representing a cluster by obtaining exclusive control of a majority of replica members in an available set thereof;
detecting a status change of one replica member with respect to the available set, wherein detecting a status change includes attempting to write an update to each available replica member, receiving success or failure information for each attempted write, determining whether a majority of replica members still exists by evaluating a number of successful writes against a number required for a majority, and reporting that the update succeeded if the number of successful writes is greater than or equal to the number required for a majority; and
taking action in response to the changed status to ensure that the replica members are consistent with respect to any update logged thereto.
-
-
25. A computer-readable medium having computer-executable instructions, comprising:
-
representing a cluster by obtaining exclusive control of a majority of replica members in an available set thereof;
detecting a status change of one replica member with respect to the available set;
taking action in response to the changed status to ensure that the replica members are consistent with respect to any update logged thereto; and
preventing further updates unless the number of successful writes is greater than or equal to the number required for a majority.
-
Specification