×

Fast cluster failure detection

  • US 8,266,474 B2
  • Filed: 03/04/2010
  • Issued: 09/11/2012
  • Est. Priority Date: 12/30/2009
  • Status: Active Grant
First Claim
Patent Images

1. A method for fast failure detection in a distributed computer system, comprising:

  • executing a distributed computer system having a plurality of clusters comprising at least a first cluster, a second cluster and the third cluster;

    initializing failure detection by creating a connected cluster list in each of the plurality of clusters, wherein for each one of the plurality of clusters, a respective connected cluster list describes others of the plurality of clusters said each one is communicatively connected with;

    sending a status update message upon a change in connectivity between the plurality of clusters;

    generating an updated connected cluster list in each of the plurality of clusters in accordance with the status update message; and

    determining whether the change in connectivity is a result of a cluster failure by examining the updated connected cluster list in each of the plurality of clusters;

    wherein upon receiving a loss of communication status update message from the second cluster, the third cluster removes the first cluster from a connected cluster list of the second cluster, and wherein the third cluster checks a connected cluster list of the third cluster to determine whether the third cluster is connected to another cluster to which the first cluster is also connected.

View all claims
  • 6 Assignments
Timeline View
Assignment View
    ×
    ×