System and method for resolving cluster partitions in out-of-band storage virtualization environments
First Claim
1. A method comprising:
- providing a coordinator virtual device corresponding to a portion of a physical data storage device;
detecting when a computer system cluster, including a plurality of nodes, is partitioned;
a first node of the plurality of nodes engaging in a race with a second node of the plurality of nodes to gain control of the coordinator virtual device; and
removing the first node of the plurality of nodes from the computer system cluster in response to the first node failing to gain control of the coordinator virtual device by losing the race, whereinthe removing comprises disabling the first node from accessing the portion of the physical data storage device.
9 Assignments
0 Petitions
Accused Products
Abstract
Systems, methods, apparatus and software can configure, support, and make use of a coordinator virtual device to determine which node or nodes of a cluster should be ejected from the cluster as a result of a cluster partition or other error event. Fencing software operating on the cluster nodes monitors the cluster for a cluster partition (split-brain) event, and when such an event occurs, software on the nodes attempts to gain control of the coordinator virtual device. A node that succeeds in gaining control of the coordinator virtual device survives. Nodes failing to gain control of the coordinator virtual device remove themselves or are removed from the cluster. The coordinator virtual device can be established by a virtual device configuration server which provides coordinator virtual device access to cluster nodes acting as virtual device configuration clients.
115 Citations
25 Claims
-
1. A method comprising:
-
providing a coordinator virtual device corresponding to a portion of a physical data storage device; detecting when a computer system cluster, including a plurality of nodes, is partitioned; a first node of the plurality of nodes engaging in a race with a second node of the plurality of nodes to gain control of the coordinator virtual device; and removing the first node of the plurality of nodes from the computer system cluster in response to the first node failing to gain control of the coordinator virtual device by losing the race, wherein the removing comprises disabling the first node from accessing the portion of the physical data storage device. - View Dependent Claims (2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12)
-
-
13. A system comprising:
-
a first data storage device; a virtual device configuration server coupled to the first storage device and including a first memory and a first processor configured to provide a coordinator virtual device corresponding to a portion of the first data storage device; a plurality of virtual device configuration clients configured as a computer system cluster, a first of the plurality of virtual device configuration clients including a second memory and a second processor configured to; detect when the computer system cluster is partitioned, engage in a race with a second of the plurality of virtual device configuration clients to gain control of the coordinator virtual device corresponding to the portion of the first data storage device, and disable the first of the plurality of virtual device configuration clients from accessing the portion of the first data storage device by removing the first of the plurality of virtual device configuration clients from the computer system cluster in response to the first of the plurality of virtual device configuration clients failing to gain control of the coordinator virtual device by losing the race. - View Dependent Claims (14, 15, 16, 17, 18, 19, 20, 21)
-
-
22. An apparatus comprising:
-
a means for providing a coordinator virtual device corresponding to portion of a physical data storage device; a means for detecting when a computer system cluster, including a plurality of nodes, is partitioned; a means for engaging a first node of the plurality of nodes in a race with a second node of the plurality of nodes to gain control of the coordinator virtual device; and a means for disabling the first node from accessing the portion of the first data storage device by removing a first node of the plurality of nodes from the computer system cluster in response to the first node failing to gain control of the coordinator virtual device by losing the race. - View Dependent Claims (23, 24, 25)
-
Specification