Method for detecting and resolving a partition condition in a cluster
First Claim
1. A method for detecting and resolving a partition condition in a cluster of computers in a networked environment, the method comprising:
- creating a scratch pad area accessible by the cluster of computers;
dividing the scratch pad into a plurality of slots, each slot associated with one of a plurality of nodes within the cluster of computers;
recording in the plurality of slots, a generation number and a list of known nodes by each one of the plurality of nodes, wherein an identifier is written in the list for each node that is known to a writing node and wherein the generation number and the list of known nodes is recorded when a change of membership occurs in the cluster of computers;
comparing each slot of the plurality of slots to ensure the generation number and the list of known nodes matches in each slot of the plurality of slots;
resolving the partition condition by creating a list of surviving nodes and re-allocating appropriate resources to each of the surviving nodes; and
ordering a first node not on the list of surviving nodes to halt execution by writing, by a second node on the list of surviving nodes, a termination message into the slot associated with the first node.
11 Assignments
0 Petitions
Accused Products
Abstract
A method and system to detect and resolve a partition condition in a cluster of computers in a networked environment is described. The method can include: creating a scratch pad area accessible by the cluster of computers; dividing the scratch pad into a plurality of slots; recording in the plurality of slots, a generation number and a list of known nodes by each one of the plurality of notes, wherein an identifier is written in the list for each node that is known to a writing node; comparing each slot of the plurality of slots to ensure the generation number and the list of known nodes matches in each slot of the plurality of slots; and resolving the partition condition by creating a list of surviving nodes and re-allocating appropriate resources to each of the surviving nodes.
-
Citations
9 Claims
-
1. A method for detecting and resolving a partition condition in a cluster of computers in a networked environment, the method comprising:
-
creating a scratch pad area accessible by the cluster of computers; dividing the scratch pad into a plurality of slots, each slot associated with one of a plurality of nodes within the cluster of computers; recording in the plurality of slots, a generation number and a list of known nodes by each one of the plurality of nodes, wherein an identifier is written in the list for each node that is known to a writing node and wherein the generation number and the list of known nodes is recorded when a change of membership occurs in the cluster of computers; comparing each slot of the plurality of slots to ensure the generation number and the list of known nodes matches in each slot of the plurality of slots; resolving the partition condition by creating a list of surviving nodes and re-allocating appropriate resources to each of the surviving nodes; and ordering a first node not on the list of surviving nodes to halt execution by writing, by a second node on the list of surviving nodes, a termination message into the slot associated with the first node. - View Dependent Claims (2, 3, 4, 5, 6, 7, 8, 9)
-
Specification