×

Topology manager for failure detection in a distributed computing system

  • US 10,341,168 B2
  • Filed: 04/18/2017
  • Issued: 07/02/2019
  • Est. Priority Date: 04/18/2017
  • Status: Active Grant
First Claim
Patent Images

1. A computer-implemented method comprising:

  • receiving, by a topology manager of a distributed computing system, notification that a destination computing node in the distributed computing system is not responding to a communication request, the topology manager being implemented on a data partition of the distributed computing system, the distributed computing system comprising a plurality of computing nodes, the plurality of nodes comprising the destination computing node;

    determining, by the topology manager, that the destination computing node is dead and/or has a loss of communication with one or more other computing nodes in the plurality of computing nodes by querying at least a subset of other computing nodes of the plurality of computing nodes regarding liveness of the destination computing node and receiving confirmation from a quorum of the queried computing nodes;

    retiring, by the topology manager in response to the determining, the destination computing node, the retiring causing the destination computing node to become a retired computing node; and

    causing, by the topology manager, a load balancing of replicas of data partitions in the distributed computing system to compensate for loss of the retired computing node, the load balancing comprising re-assigning one or more of the replicas of data partitions among one or more surviving computing nodes in the plurality of computing nodes.

View all claims
  • 1 Assignment
Timeline View
Assignment View
    ×
    ×