×

Quorum-based power-down of unresponsive servers in a computer cluster

  • US 7,716,222 B2
  • Filed: 08/15/2008
  • Issued: 05/11/2010
  • Est. Priority Date: 11/04/2004
  • Status: Expired due to Fees
First Claim
Patent Images

1. A computer-implemented method for handling an unresponsive server in a cluster, the method comprising the steps of:

  • each server in the cluster sending a periodic message to other servers in the cluster to indicate proper function of the server sending the periodic message;

    each server in the cluster receiving periodic messages from other servers in the cluster that indicate the other servers in the cluster are functioning properly;

    generating a membership change message to all servers in the cluster when any of the servers in the cluster become unresponsive;

    determining whether the cluster has quorum, wherein a cluster has quorum when a majority of servers in the cluster are responsive, wherein in determining the majority of servers, if there is an odd number of servers in the cluster, each server in the cluster counts as one server, and if there is an even number of servers in the cluster, each server in the cluster that is not a manager of the cluster counts as one server and the manager of the cluster counts as two servers;

    receiving an indication of a server failure;

    if the majority of servers in the cluster are responsive, performing the steps of;

    determining whether the indication of the server failure indicates the manager of the cluster failed;

    if the manager of the cluster failed, issuing at least one command to power down all unresponsive servers in the cluster, wherein a server is powered down when the server will not become responsive in the future, wherein an unresponsive server is a server that fails to send a periodic message that indicates the server is functioning properly;

    if the manager of the cluster did not fail, issuing at least one command to power down a server corresponding to the received indication of server failure;

    determining whether the power down of the at least one of the other servers was successful;

    if the power down of the at least one of the other servers was successful, enabling the failing over any resources on the at least one of the other servers that was powered down to at least one server that is responsive; and

    if the power down of the at least one of the other servers was not successful, disabling the cluster.

View all claims
  • 0 Assignments
Timeline View
Assignment View
    ×
    ×