Method and system for coordinated multiple cluster failover
First Claim
1. A method for coordinating availability of data processing resources between a first cluster of nodes each controlled by a respective first cluster manager and multiple other clusters of nodes each controlled by a respective second cluster manager, the method comprising:
- receiving a disruption signal from an exit program of one of the first cluster managers, the disruption signal being representative of a disruption event associated with a specific one of the nodes of the first cluster, the disruption signal being received by a first hypercluster manager of the specific one of the nodes of the first cluster;
deriving a local action code and a hypercluster event code from a hypercluster rules list, the local action code and the hypercluster event code corresponding to the disruption event, the local action code having an associated local cluster activation sequence for regulating the operation of the first cluster of nodes, and the hypercluster event code having an associated cluster activation sequence for regulating the operation of the respective multiple other clusters of nodes;
transmitting the hypercluster event code to the multiple other clusters of nodes, each of the nodes of the multiple other clusters of nodes including a second hypercluster manager for execution of a remote cluster activation sequence thereon;
receiving a token on the first hypercluster manager transmitted thereto in response to a completed execution of the remote cluster activation sequence on a one of the node of the multiple other clusters of nodes; and
executing the local cluster activation sequence on the first cluster of nodes upon receipt of the token;
wherein the first cluster of nodes and the multiple other clusters of nodes each function independent of each other, the nodes of the first cluster of nodes communicate with each other by a first set of messages, and the nodes of a given one of the multiple other clusters of nodes communicate with each other by a second set of messages.
15 Assignments
0 Petitions
Accused Products
Abstract
Hyperclusters are a cluster of clusters. Each cluster has associated with it one or more resource groups, and independent node failures within the clusters are handled by platform specific clustering software. The management of coordinated failovers across dependent or independent resources running on heterogeneous platforms is contemplated. A hypercluster manager running on all of the nodes in a cluster communicates with platform specific clustering software regarding any failure conditions, and utilizing a rule-based decision making system, determines actions to take on the node. A plug-in extends exit points definable in non-hypercluster clustering technologies. The failure notification is passed to other affected resource groups in the hypercluster.
62 Citations
7 Claims
-
1. A method for coordinating availability of data processing resources between a first cluster of nodes each controlled by a respective first cluster manager and multiple other clusters of nodes each controlled by a respective second cluster manager, the method comprising:
-
receiving a disruption signal from an exit program of one of the first cluster managers, the disruption signal being representative of a disruption event associated with a specific one of the nodes of the first cluster, the disruption signal being received by a first hypercluster manager of the specific one of the nodes of the first cluster; deriving a local action code and a hypercluster event code from a hypercluster rules list, the local action code and the hypercluster event code corresponding to the disruption event, the local action code having an associated local cluster activation sequence for regulating the operation of the first cluster of nodes, and the hypercluster event code having an associated cluster activation sequence for regulating the operation of the respective multiple other clusters of nodes; transmitting the hypercluster event code to the multiple other clusters of nodes, each of the nodes of the multiple other clusters of nodes including a second hypercluster manager for execution of a remote cluster activation sequence thereon; receiving a token on the first hypercluster manager transmitted thereto in response to a completed execution of the remote cluster activation sequence on a one of the node of the multiple other clusters of nodes; and executing the local cluster activation sequence on the first cluster of nodes upon receipt of the token; wherein the first cluster of nodes and the multiple other clusters of nodes each function independent of each other, the nodes of the first cluster of nodes communicate with each other by a first set of messages, and the nodes of a given one of the multiple other clusters of nodes communicate with each other by a second set of messages. - View Dependent Claims (2, 3, 4, 5, 6, 7)
-
Specification