Method for detecting and resolving a partition condition in a cluster
First Claim
1. A computer program product for detecting and resolving a partition condition in a cluster of computers in a networked environment, the computer program product stored on a non-transitory computer-readable medium and comprising:
- computer-executable instructions for creating a scratch pad area accessible by the cluster of computers;
computer-executable instructions for dividing the scratch pad into a plurality of slots, each slot associated with one of a plurality of nodes within the cluster of computers, wherein each slot includes at least a heartbeat field indicating that cluster software is loaded on the node and a node state field indicating a current state of the node, wherein the current state identifies the node as being dead, alive, or preparing to shut down;
computer-executable instructions for recording in the plurality of slots, a generation number and a list of known nodes by each one of the plurality of nodes, wherein an identifier is written in the list for each node that is known to a writing node and wherein the generation number and the list of known nodes is recorded when a change of membership occurs in the cluster of computers;
computer-executable instructions for comparing each slot of the plurality of slots to ensure the generation number and the list of known nodes matches in each slot of the plurality of slots;
computer-executable instructions for resolving the partition condition by creating a list of surviving nodes and re-allocating appropriate resources to each of the surviving nodes,computer-executable instructions requiring each node not on the list of surviving nodes to re-register with the cluster of computers; and
wherein the computer-executable instructions for comparing each slot include computer-executable instructions for finding a list with a master node to create the list of surviving nodes and shutting down each node not on the list with the master node.
12 Assignments
0 Petitions
Accused Products
Abstract
A method for detecting and resolving a partition condition in a cluster of computers in a networked environment is described. In one example, the method includes creating a scratch pad area and dividing the scratch pad into slots. Each slot is associated with a node within the cluster. A generation number and a list of known nodes are recorded in each slot when a change of membership occurs in the cluster. The slots are compared to ensure the generation number and the list of known nodes matches in each slot, and the partition condition is resolved by creating a list of surviving nodes and re-allocating appropriate resources to each of the surviving nodes.
-
Citations
17 Claims
-
1. A computer program product for detecting and resolving a partition condition in a cluster of computers in a networked environment, the computer program product stored on a non-transitory computer-readable medium and comprising:
-
computer-executable instructions for creating a scratch pad area accessible by the cluster of computers; computer-executable instructions for dividing the scratch pad into a plurality of slots, each slot associated with one of a plurality of nodes within the cluster of computers, wherein each slot includes at least a heartbeat field indicating that cluster software is loaded on the node and a node state field indicating a current state of the node, wherein the current state identifies the node as being dead, alive, or preparing to shut down; computer-executable instructions for recording in the plurality of slots, a generation number and a list of known nodes by each one of the plurality of nodes, wherein an identifier is written in the list for each node that is known to a writing node and wherein the generation number and the list of known nodes is recorded when a change of membership occurs in the cluster of computers; computer-executable instructions for comparing each slot of the plurality of slots to ensure the generation number and the list of known nodes matches in each slot of the plurality of slots; computer-executable instructions for resolving the partition condition by creating a list of surviving nodes and re-allocating appropriate resources to each of the surviving nodes, computer-executable instructions requiring each node not on the list of surviving nodes to re-register with the cluster of computers; and wherein the computer-executable instructions for comparing each slot include computer-executable instructions for finding a list with a master node to create the list of surviving nodes and shutting down each node not on the list with the master node. - View Dependent Claims (2, 3, 4, 5, 6)
-
-
7. A method for detecting and resolving a partition condition in a cluster of computers in a networked environment, the method comprising:
-
creating a scratch pad area accessible by the cluster of computers; dividing the scratch pad into a plurality of slots, each slot associated with one of a plurality of nodes within the cluster of computers; recording in the plurality of slots, a generation number and a list of known nodes by each one of the plurality of nodes, wherein an identifier is written in the list for each node that is known to a writing node and wherein the generation number and the list of known nodes is recorded when a change of membership occurs in the cluster of computers; comparing each slot of the plurality of slots to ensure the generation number and the list of known nodes matches in each slot of the plurality of slots; creating a list of surviving nodes by listing a first set of nodes determined by comparing each slot of the plurality of slots, including finding a list with a master node to create the list of surviving nodes; re-allocating appropriate resources to each of the surviving nodes; and shutting down each node not on the list of surviving nodes with the master node by requiring each node not on the list of surviving nodes to write a special message in a respective slot for that node and then shut down immediately. - View Dependent Claims (8, 9, 10, 11, 12, 16)
-
-
13. A method for detecting and resolving a partition condition in a cluster of computers in a networked environment, the method comprising:
-
creating a scratch pad area accessible by the cluster of computers; dividing the scratch pad into a plurality of slots, each slot associated with one of a plurality of nodes within the cluster of computers, wherein each slot includes at least a heartbeat field indicating that cluster software is loaded on the node and a node state field indicating a current state of the node, wherein the current state identifies the node as being dead, alive, or preparing to shut down; recording in the plurality of slots, a generation number and a list of known nodes by each one of the plurality of nodes, wherein an identifier is written in the list for each node that is known to a writing node and wherein the generation number and the list of known nodes is recorded when a change of membership occurs in the cluster of computers; comparing each slot of the plurality of slots to ensure the generation number and the list of known nodes matches in each slot of the plurality of slots; resolving the partition condition by creating a list of surviving nodes and re-allocating appropriate resources to each of the surviving nodes; requiring each node not on the list of surviving nodes to re-register with the cluster of computers; and instructing all nodes in the cluster to shut down if at least one non-surviving node fails to update its node state to indicate that it is not a surviving node.
-
-
14. A method for detecting and resolving a partition condition in a cluster of computers in a networked environment, the method comprising:
-
maintaining a scratch pad area accessible by the cluster of computers, wherein the scratch pad area is divided into a plurality of slots and each slot is associated with at least one of a plurality of nodes within the cluster of computers; recording in the plurality of slots a generation number, a list of known nodes by each one of the plurality of nodes, and a node state indicating whether the node is a surviving node within the cluster, wherein the generation number and the list of known nodes is recorded when a change of membership occurs in the cluster of computers, and wherein the node state is updated at least when the node is not a surviving node; comparing each slot of the plurality of slots to ensure the generation number and the list of known nodes matches in each slot of the plurality of slots; resolving the partition condition by creating a list of surviving nodes, wherein nodes not on the list of surviving nodes update their node state to indicate that they are not a surviving node; and instructing all nodes in the cluster to shut down if at least one non-surviving node fails to update its node state to indicate that it is not a surviving node. - View Dependent Claims (15, 17)
-
Specification