Apparatus and method for quorum-based power-down of unresponsive servers in a computer cluster
First Claim
1. An apparatus comprising:
- at least one processor;
a memory coupled to the at least one processor;
a server process residing in the memory and executed by the at least one processor;
a cluster engine residing in the memory and executed by the at least one processor, the cluster engine handling communications between the server process and other servers in a cluster; and
a quorum-based server power-down mechanism residing in the memory and executed by the at least one processor, the quorum-based server power-down mechanism determining whether the server process is part of a group of servers that include a majority of servers in the cluster, and if so, the quorum-based server power-down mechanism issuing a command to power down at least one of the other servers in the cluster.
1 Assignment
0 Petitions
Accused Products
Abstract
An apparatus and method provide a quorum-based server power-down mechanism that allows a manager in a computer cluster to power-down unresponsive servers in a manner that assures that an unresponsive server does not become responsive again. In order for a manager in a cluster to power down servers in the cluster, the cluster must have quorum, meaning that a majority of the computers in the cluster must be responsive. If the cluster has quorum, and if the manager server did not fail, the manager causes the failed server(s) to be powered down. If the manager server did fail, the new manager causes all unresponsive servers in the cluster to be powered down. If the power-down is successful, the resources on the failed server(s) may be failed over to other servers in the cluster that were not powered down. If the power-down is not successful, the cluster is disabled.
35 Citations
50 Claims
-
1. An apparatus comprising:
-
at least one processor;
a memory coupled to the at least one processor;
a server process residing in the memory and executed by the at least one processor;
a cluster engine residing in the memory and executed by the at least one processor, the cluster engine handling communications between the server process and other servers in a cluster; and
a quorum-based server power-down mechanism residing in the memory and executed by the at least one processor, the quorum-based server power-down mechanism determining whether the server process is part of a group of servers that include a majority of servers in the cluster, and if so, the quorum-based server power-down mechanism issuing a command to power down at least one of the other servers in the cluster. - View Dependent Claims (2, 3, 4, 5, 6, 7, 8, 9, 10)
-
-
11. A networked computer system comprising:
a plurality of servers coupled together via a network into a cluster, each server comprising;
a cluster engine that handles communications between servers in the cluster; and
a quorum-based server power-down mechanism that determines whether a server is part of a group of servers that includes a majority of servers in the cluster, and if so, the quorum-based server power-down mechanism issuing a command to power down at least one server in the cluster. - View Dependent Claims (12, 13, 14, 15, 16, 17, 18, 19, 20)
-
21. An apparatus comprising:
-
(A) at least one processor;
(B) a memory coupled to the at least one processor;
(C) a server process residing in the memory and executed by the at least one processor;
(D) a cluster engine residing in the memory and executed by the at least one processor, the cluster engine handling communications between the server process and other servers in a cluster, the cluster engine comprising;
(D1) a heartbeat mechanism that sends a periodic message to the other servers in the cluster to indicate the server process is functioning properly and that receives periodic messages from the other servers in the cluster that indicate the other servers in the cluster are functioning properly;
(D2) a membership change mechanism that generates a membership change message to all servers in the cluster when any of the servers in the cluster become unresponsive;
(E) a quorum-based server power-down mechanism residing in the memory and executed by the at least one processor, the quorum-based server power-down mechanism determining whether the server process is part of a group of servers that includes a majority of servers in the cluster, and if so, the quorum-based server power-down mechanism determining whether a manager of the cluster failed when an indication of a server failure is received, and if a manager of the cluster failed, the quorum-based server power-down mechanism issues at least one command to power down all unresponsive servers in the cluster, and if a manager of the cluster did not fail, the quorum-based server power-down mechanism issues at least one command to power down a server corresponding to the received indication of server failure. - View Dependent Claims (22, 23, 24)
-
-
25. A computer-implemented method for handling an unresponsive server in a cluster, the method comprising the steps of:
-
determining when a server in the cluster becomes unresponsive;
determining whether a majority of servers in the cluster are responsive; and
if a majority of servers in the cluster are responsive, issuing a command to power down at least one server in the cluster. - View Dependent Claims (26, 27, 28, 29, 30, 31, 32)
-
-
33. A computer-implemented method for handling an unresponsive server in a cluster, the method comprising the steps of:
-
each server in the cluster sending a periodic message to other servers in the cluster to indicate proper function of the server sending the periodic message;
each server in the cluster receiving periodic messages from other servers in the cluster that indicate the other servers in the cluster are functioning properly;
generating a membership change message to all servers in the cluster when any of the servers in the cluster become unresponsive;
determining whether a majority of servers in the cluster are responsive;
receiving an indication of a server failure;
if the majority of servers in the cluster are responsive, performing the steps of;
determining whether the indication of the server failure indicates a manager of the cluster failed;
if the manager of the cluster failed, issuing at least one command to power down all unresponsive servers in the cluster; and
if the manager of the cluster did not fail, issuing at least one command to power down a server corresponding to the received indication of server failure. - View Dependent Claims (34)
-
-
35. A program product comprising:
-
a quorum-based server power-down mechanism that determines whether a server is part of a group of servers that include a majority of servers in a cluster, and if so, the quorum-based server power-down mechanism issues a command to power down at least one server in the cluster; and
computer readable signal bearing media bearing the quorum-based server power-down mechanism. - View Dependent Claims (36, 37, 38, 39, 40, 41, 42, 43, 44, 45)
-
-
46. A program product comprising:
-
(A) a cluster engine residing in the memory and executed by the at least one processor, the cluster engine handling communications between a plurality of servers in a cluster, the cluster engine comprising;
(A1) a heartbeat mechanism that sends a periodic message to the other servers in the cluster to indicate the server process is functioning properly and that receives periodic messages from the other servers in the cluster that indicate the other servers in the cluster are functioning properly;
(A2) a membership change mechanism that generates a membership change message to all servers in the cluster when any of the servers in the cluster become unresponsive; and
(A3) a quorum-based server power-down mechanism that determines whether the server process is part of a group of servers that includes a majority of servers in the cluster, and if so, the quorum-based server power-down mechanism determines whether a manager of the cluster failed when an indication of a server failure is received, and if a manager of the cluster failed, the quorum-based server power-down mechanism issues at least one command to power down all unresponsive servers in the cluster, and if a manager of the cluster did not fail, the quorum-based server power-down mechanism issues at least one command to power down a server corresponding to the received indication of server failure; and
(B) computer readable signal bearing media bearing the cluster engine. - View Dependent Claims (47, 48, 49, 50)
-
Specification