×

Monitoring distributed software health and membership in a compute cluster

  • US 8,108,733 B2
  • Filed: 05/12/2010
  • Issued: 01/31/2012
  • Est. Priority Date: 05/12/2010
  • Status: Active Grant
First Claim
Patent Images

1. A method for monitoring health and membership of distributed software in a compute cluster having a plurality of nodes, comprising:

  • generating an ordered list of all nodes in the plurality of nodes configured to operate in the compute cluster;

    making the ordered list available to each of the plurality of nodes, each of the plurality of nodes having a watchdog component configured to perform health checks and membership checks on other nodes in the compute cluster;

    performing a health check by each node in the plurality of nodes using the watchdog component, the health check comprising;

    checking a health status of a first neighbor node in a first direction of the node in the ordered list of nodes; and

    performing a first action on the neighbor node responsive to determining that the health status of the first neighbor node is unhealthy; and

    performing a membership check by each node, using the watchdog component, on a second neighbor node in a second direction opposite the first direction, the membership check comprising;

    verifying membership in the compute cluster of a second neighbor node; and

    performing a second action on the second neighbor node responsive to determining that the second neighbor node is not a member of the compute cluster;

    wherein the ordered list provides a circular sequence of nodes traversable in either the first direction or the second direction.

View all claims
  • 1 Assignment
Timeline View
Assignment View
    ×
    ×