System and method for using cluster level quorum to prevent split brain scenario in a data grid cluster
First Claim
1. A system for using cluster quorum to prevent split brain scenario in a distributed data grid cluster, comprising:
- a data grid cluster comprising a plurality of cluster nodes wherein each cluster node comprises a microprocessor;
a plurality of cluster services, wherein each cluster service runs on a particular cluster node of said plurality of cluster nodes in the data grid cluster and collects and maintains statistics regarding communication flow between the particular cluster node and other cluster nodes in the data grid cluster, and wherein the statistics are used by the data grid cluster to determine a status associated with each other cluster node in the data grid cluster when a disconnection event happens in the data grid cluster;
a cluster quorum policy defined in a cache configuration file associated with the data grid cluster, wherein the cluster quorum policy specifies a minimum number of cluster nodes required to permit a decision whether to evict one or more cluster nodes from the data grid cluster, and a time period that the data grid cluster defers to make a decision on whether or not to evict one or more cluster nodes from the data grid cluster; and
wherein, when one or more cluster nodes are detected to have been disconnected (disconnected nodes) from the data grid cluster as a results of a disconnection event,the data grid cluster defers to make a decision on whether or not to evict the disconnected nodes from the data grid cluster for the time period specified in the cluster quorum policy,if a connection is reestablished to the disconnected nodes prior to expiration of the time period specified in the cluster quorum policy, the data grid cluster does not evict the disconnected nodes; and
if the connection is not reestablished to the disconnected nodes prior to expiration of the time period specified in the cluster quorum policy and a number of cluster nodes remaining in the cluster after excluding the disconnected nodes is equal to at least the minimum number of cluster nodes required to permit a decision whether to evict one or more cluster nodes from the data grid cluster, the data grid cluster does evict the disconnected nodes.
1 Assignment
0 Petitions
Accused Products
Abstract
A system and method is described for use with a data grid cluster, which uses cluster quorum to prevent split brain scenario. The data grid cluster includes a plurality of cluster nodes, each of which runs a cluster service. Each cluster service collects and maintains statistics regarding communication flow between its cluster node and the other cluster nodes in the data grid cluster. The statistics are used to determine a status associated with other cluster nodes in the data grid cluster whenever a disconnect event happens. The data grid cluster is associated with a quorum policy, which is defined in a cache configuration file, and which specifies a time period that a cluster node will wait before making a decision on whether or not to evict one or more cluster nodes from the data grid cluster.
-
Citations
20 Claims
-
1. A system for using cluster quorum to prevent split brain scenario in a distributed data grid cluster, comprising:
-
a data grid cluster comprising a plurality of cluster nodes wherein each cluster node comprises a microprocessor; a plurality of cluster services, wherein each cluster service runs on a particular cluster node of said plurality of cluster nodes in the data grid cluster and collects and maintains statistics regarding communication flow between the particular cluster node and other cluster nodes in the data grid cluster, and wherein the statistics are used by the data grid cluster to determine a status associated with each other cluster node in the data grid cluster when a disconnection event happens in the data grid cluster; a cluster quorum policy defined in a cache configuration file associated with the data grid cluster, wherein the cluster quorum policy specifies a minimum number of cluster nodes required to permit a decision whether to evict one or more cluster nodes from the data grid cluster, and a time period that the data grid cluster defers to make a decision on whether or not to evict one or more cluster nodes from the data grid cluster; and wherein, when one or more cluster nodes are detected to have been disconnected (disconnected nodes) from the data grid cluster as a results of a disconnection event, the data grid cluster defers to make a decision on whether or not to evict the disconnected nodes from the data grid cluster for the time period specified in the cluster quorum policy, if a connection is reestablished to the disconnected nodes prior to expiration of the time period specified in the cluster quorum policy, the data grid cluster does not evict the disconnected nodes; and if the connection is not reestablished to the disconnected nodes prior to expiration of the time period specified in the cluster quorum policy and a number of cluster nodes remaining in the cluster after excluding the disconnected nodes is equal to at least the minimum number of cluster nodes required to permit a decision whether to evict one or more cluster nodes from the data grid cluster, the data grid cluster does evict the disconnected nodes. - View Dependent Claims (2, 3, 4, 5, 6, 7, 8, 9, 10)
-
-
11. A method for using cluster level quorum in a data grid cluster comprising a plurality of cluster nodes, the method comprising:
-
running a cluster service on each cluster node of the plurality of clusters nodes in the data grid cluster; collecting and maintaining, via the cluster service running on each particular cluster node, statistics regarding communication flow between the particular cluster node and other cluster nodes in the data grid cluster; determining a status associated with each cluster node in the data grid cluster when a disconnection event happens; providing a cluster quorum policy defined in a cache configuration file associated with the data grid cluster, wherein the cluster quorum policy specifies a minimum number of cluster nodes required to permit a decision whether to evict one or more cluster nodes from the data grid cluster, and a time period that the data grid cluster defers to make a decision on whether or not to evict one or more cluster nodes from the data grid cluster; and in response to detecting one or more cluster nodes to have been disconnected (disconnected nodes) from the data grid cluster as a results of a disconnection event, deferring to make a decision for the time period specified in the cluster quorum policy on whether or not to evict the disconnected nodes from the data grid cluster based on the cluster quorum policy, if a connection is reestablished to the disconnected nodes prior to expiration of the time period specified in the cluster quorum policy, determining not to evict the disconnected nodes; and if the connection is not reestablished to the disconnected nodes prior to expiration of the time period specified in the cluster quorum policy and a number of cluster nodes remaining in the cluster after excluding the disconnected nodes is equal to at least the minimum number of cluster nodes required to permit a decision whether to evict one or more cluster nodes from the data grid cluster, the data grid cluster does evict the disconnected nodes. - View Dependent Claims (12, 13, 14, 15, 16, 17, 18)
-
-
19. A non-transitory machine readable medium having instructions stored thereon for using cluster quorum to prevent split brain scenario in a distributed data grid cluster comprising a plurality of cluster nodes, which instructions, when executed by a system, cause the system to perform steps comprising:
-
running a cluster service on each cluster node of the plurality of clusters nodes in the data grid cluster; collecting and maintaining, via the cluster service running on each particular cluster node, statistics regarding communication flow between the particular cluster node and other cluster nodes in the data grid cluster; determining a status associated with each cluster node in the data grid cluster when a disconnection event happens; providing a cluster quorum policy defined in a cache configuration file associated with the data grid cluster, wherein the cluster quorum policy specifies a minimum number of cluster nodes required to permit a decision whether to evict one or more cluster nodes from the data grid cluster, and a time period that the data grid cluster defers to make a decision on whether or not to evict one or more cluster nodes from the data grid cluster; and in response to detecting one or more cluster nodes to have been disconnected (disconnected nodes) from the data grid cluster as a results of a disconnection event, deferring to make a decision for the time period specified in the cluster quorum policy on whether or not to evict the disconnected nodes from the data grid cluster based on the cluster quorum policy, if a connection is reestablished to the disconnected nodes prior to expiration of the time period specified in the cluster quorum policy, determining not to evict the disconnected nodes; and if the connection is not reestablished to the disconnected nodes prior to expiration of the time period specified in the cluster quorum policy and a number of cluster nodes remaining in the cluster after excluding the disconnected nodes is equal to at least the minimum number of cluster nodes required to permit a decision whether to evict one or more cluster nodes from the data grid cluster, the data grid cluster does evict the disconnected nodes. - View Dependent Claims (20)
-
Specification