Resource group quorum scheme for highly scalable and highly available cluster system management
First Claim
1. A method for providing highly available computing services in a cluster system, comprising:
- segregating data processing systems in the cluster system into at least one resource group, each resource group including at least two data processing systems and related resources for providing a respective computing service;
prior to executing a desired computing service, determining whether a resource group responsible for the desired computing service is in a quorum state by determining whether a majority of data processing systems in the resource group are on line; and
responsive to determining that the resource group responsible for providing the desired service is in the quorum state, providing the desired computing service.
5 Assignments
0 Petitions
Accused Products
Abstract
A cluster system is treated as a set of resource groups, each resource group including a highly available application and the resources upon which it depends. A resource group may have between 2 and M data processing systems, where M is small relative to the cluster size N of the total cluster. Configuration and status information for the resource group is fully replicated only on those data processing systems which are members of the resource group. In the event of failure of a data processing system within the cluster, only resource groups including the failed data processing system are affected. Each resource group having a quorum of its data processing systems available continues to provide services, allowing many applications within the cluster to continue functioning while the cluster is restored.
114 Citations
18 Claims
-
1. A method for providing highly available computing services in a cluster system, comprising:
-
segregating data processing systems in the cluster system into at least one resource group, each resource group including at least two data processing systems and related resources for providing a respective computing service;
prior to executing a desired computing service, determining whether a resource group responsible for the desired computing service is in a quorum state by determining whether a majority of data processing systems in the resource group are on line; and
responsive to determining that the resource group responsible for providing the desired service is in the quorum state, providing the desired computing service. - View Dependent Claims (2, 3, 4, 5)
determining whether at least one data processing systems in the resource group is online.
-
-
3. The method of claim 1, further comprising:
detecting a failure of a data processing system within the cluster system.
-
4. The method of claim 3, further comprising:
determining whether the resource group includes the failed data processing system.
-
5. The method of claim 4, further comprising:
identifying all resource groups within the cluster system including the failed data processing system.
-
6. A cluster system, comprising:
-
a plurality of data processing systems segregated into a plurality of resource groups, each resource group including at least two data processing systems and providing a respective computing service;
at least one network connecting the data processing systems in the cluster system;
a configuration database distributed among the data processing systems, each data processing system within the cluster system containing cluster-level configuration and status information and resource group configuration and status information for every resource group including the data processing system but no resource group configuration and status information for any resource group not including the data processing system; and
a failover mechanism identifying, in response to failure of a data processing system within the cluster system, every resource group including the failed data processing system and determining, for each identified resource group including the failed data processing system, whether a quorum exists for the respective identified resource group, wherein each identified resource group provides a respective computing service if a quorum is available. - View Dependent Claims (7, 8)
means for suspending the computing service provided by a resource group including the failed data processing system if a quorum of the resource group is not available.
-
-
8. The cluster system of claim 6, further comprising:
means for reintegrating the failed data processing system upon restoration.
-
9. A data processing system, further comprising:
-
a processor executing instructions for providing a computing service;
a network connection permitting the data processing system to be connected to a cluster at system segregated into a plurality of resource groups;
a memory containing configuration information identifying each resource group within the cluster system including the data processing system; and
a failover mechanism detecting failure of any other data processing system within a resource group including the data processing system, the failover mechanism determining whether the resource group including the failed data processing system is in a quorum state by determining whether a majority of data processing systems in the resource group are on line and permitting the data processing system to continue providing the computing service if the resource group including the failed data processing system is in the quorum state. - View Dependent Claims (10, 11, 12, 14, 15)
means for suspending the computing service if the resource group including the failed data processing system is not in the quorum state.
-
-
11. The data processing system of claim 9, further comprising:
means for serving requests for the computing service from the data processing system if the resource group including the failed data processing system is in the quorum state.
-
12. The data processing system of claim 9, wherein the computing service comprises a highly available application.
-
14. The method of claim 12, wherein the step of determining whether the resource group is in a quorum state further comprises:
determining whether at least one data processing systems within the resource group is online.
-
15. The method of claim 12, wherein the step of determining whether the resource group is in a quorum state further comprises:
determining whether sufficient the resource group includes sufficient functioning resources to serve requests to the application server associated with the resource group.
-
13. A method of responding to a data processing system failure within a cluster system segregated into a plurality of resource groups, each resource group including at least two data processing systems and an application server, comprising:
-
identifying every resource group including the failed data processing system;
for each resource group including the failed data processing system, determining whether the resource group is in a quorum state by determining whether a majority of data processing systems within the resource group are on line;
for each resource group including the failed data processing system which is in a quorum state, serving requests to the application server; and
for each resource group including the failed data processing system which is not in a quorum state, suspending the application server.
-
-
16. A computer program product within a computer usable medium, comprising:
-
instructions embodied within said computer usable medium, for segregating data processing systems in a network into at least one resource group, each resource group including at least two data processing systems and related resources for providing a respective computing service;
instructions embodied within said computer usable medium, for determining whether a resource group responsible for the desired computing service is in a quorum state prior to executing a desired computing service by determining whether a majority of data processing systems in the resource group are on line; and
instructions embodied within said computer usable medium, within the computer usable medium for providing the desired computing service in response to determining that the resource group responsible for providing the desired service is in the quorum state. - View Dependent Claims (17, 18)
instructions for determining whether at least one data processing systems in the resource group is online.
-
-
18. The computer program product of claim 16, wherein the instructions embodied within said computer usable medium for determining whether a resource group responsible for the desired computing service is in a quorum state further comprise:
instructions for determining the resource group responsible for providing the desired computing service includes sufficient functioning resources to serve a request for the desired computing resources.
Specification