Method and system for coordinated multiple cluster failover
First Claim
1. A method for coordinating availability of data processing resources between a first cluster of nodes each controlled by a respective first cluster manager and a second cluster of nodes each controlled by a respective second cluster manager, the method comprising:
- receiving a disruption signal from an exit program of one of the first cluster managers, the disruption signal being representative of a disruption event associated with a specific one of the nodes of the first cluster, the disruption signal being received by a first hypercluster manager of the specific one of the nodes of the first cluster;
deriving a local action code from a hypercluster rules list, the local action code corresponding to the disruption event and containing a cluster activation sequence for regulating the operation of one of the nodes of the second cluster; and
transmitting the local action code to the second cluster of nodes each including a second hypercluster manager for execution of the cluster activation sequence;
wherein the first cluster of nodes and the second cluster of nodes each function autonomously and communicate with each other by the local action code.
16 Assignments
0 Petitions
Accused Products
Abstract
Hyperclusters are a cluster of clusters. Each cluster has associated with it one or more resource groups, and independent node failures within the clusters are handled by platform specific clustering software. The management of coordinated failovers across dependent or independent resources running on heterogeneous platforms is contemplated. A hypercluster manager running on all of the nodes in a cluster communicates with platform specific clustering software regarding any failure conditions, and utilizing a rule-based decision making system, determines actions to take on the node. A plug-in extends exit points definable in non-hypercluster clustering technologies. The failure notification is passed to other affected resource groups in the hypercluster.
-
Citations
42 Claims
-
1. A method for coordinating availability of data processing resources between a first cluster of nodes each controlled by a respective first cluster manager and a second cluster of nodes each controlled by a respective second cluster manager, the method comprising:
-
receiving a disruption signal from an exit program of one of the first cluster managers, the disruption signal being representative of a disruption event associated with a specific one of the nodes of the first cluster, the disruption signal being received by a first hypercluster manager of the specific one of the nodes of the first cluster; deriving a local action code from a hypercluster rules list, the local action code corresponding to the disruption event and containing a cluster activation sequence for regulating the operation of one of the nodes of the second cluster; and transmitting the local action code to the second cluster of nodes each including a second hypercluster manager for execution of the cluster activation sequence; wherein the first cluster of nodes and the second cluster of nodes each function autonomously and communicate with each other by the local action code. - View Dependent Claims (2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14)
-
-
15. An apparatus implemented within a local node for coordinating availability of data processing resources between the local node in a first cluster of local nodes each including a cluster manager and a remote node, the apparatus comprising:
-
a local event receiver for capturing local disruption event signals generated in response to local cluster heartbeat signals exchanged between the first cluster of nodes by the respective cluster managers, the local cluster heartbeat signals being representative of the status and condition of at least one of the nodes of the first cluster; a hypercluster event translator for translating the local disruption event signals to a first universal event code; a hypercluster event receiver for capturing a second universal event code from the remote node, the second universal event code being representative of a disruption event associated with the remote node; a hypercluster heartbeat receiver for capturing hypercluster heartbeat signals from the second cluster of nodes, the hypercluster heartbeat signals being representative of the status and condition of the second cluster; and a router for correlating a one of the first and second universal event codes to a cluster activation sequence operative to regulated the operation of at least one of the nodes in accordance with a set of hypercluster rules; wherein the local cluster heartbeat signals are communicated within the first cluster and within the second cluster, and the hypercluster heartbeat signals are communicated between the first cluster and the second cluster. - View Dependent Claims (16, 17, 18, 19, 20, 21, 22, 23, 24, 25)
-
-
26. An article of manufacture comprising a program storage medium readable by a computer, the medium tangibly embodying one or more programs of instructions executable by the computer to perform a method for coordinating availability of data processing resources between a first cluster of nodes each controlled by a respective first cluster manager and a second cluster of nodes each controlled by a respective second cluster manager, the method comprising:
-
receiving a disruption signal from an exit program of one of the first cluster managers, the disruption signal being representative of a disruption event associated with a specific one of the nodes of the first cluster, the disruption signal being received by a first hypercluster manager of the specific one of the nodes of the first cluster; deriving a local action code from a hypercluster rules list, the local action code corresponding to the disruption event and containing a cluster activation sequence for regulating the operation of one of the nodes of the second cluster; and transmitting the local action code to the second cluster of nodes each including a second hypercluster manager for execution of the cluster activation sequence; wherein the first cluster of nodes and the second cluster of nodes each function autonomously and communicate with each other by the local action code. - View Dependent Claims (27, 28, 29, 30, 31, 32)
-
-
33. An apparatus for coordinating availability of data processing resources between a local node in a first cluster and a remote node in a second cluster, the apparatus comprising:
-
a local event receiver for capturing local disruption events; an event translator for translating the local disruption event to a universal event code; a hypercluster event receiver for capturing remote disruption events from one of the nodes of the second cluster; a router for correlating the universal event code to a cluster activation sequence in accordance with a set of hypercluster rules; and a rule propagation module for receiving changes to the hypercluster rules list from the remote node, the rule propagation module further verifying the changes and applying the changes to the hypercluster rules list. - View Dependent Claims (34, 35, 36, 37, 38, 39, 40, 41, 42)
-
Specification