×

Fault tolerant federation of computing clusters

  • US 10,177,994 B2
  • Filed: 08/13/2014
  • Issued: 01/08/2019
  • Est. Priority Date: 08/13/2014
  • Status: Active Grant
First Claim
Patent Images

1. At a computer system that includes at least one processor, a computer-implemented method for facilitating communication and maximizing an efficiency of directed work flow between computing nodes in a cluster federation, the method comprising:

  • an act of identifying a plurality of computing nodes that are to be a part of the cluster federation, the cluster federation including a master cluster and a worker cluster, wherein the master cluster includes a first master node and a second master node, and wherein the worker cluster includes a worker node;

    an act of assigning a director role to the first master node, which first master node is included in the master cluster, wherein the first master node, after being assigned the director role, governs decisions that affect consistency within the cluster federation;

    an act of assigning a leader role to the second master node, which second master node is also included in the master cluster, wherein the second master node, after being assigned the leader role, monitors and controls the worker node in the worker cluster, whereby the master cluster includes the first master node having the director role and the second master node having the leader role, the first master node being different than the second master node, the director role being different than the leader role;

    an act of maintaining a partitioned database that is usable to facilitate communication between the first master node and the second master node, wherein any particular entry in the partitioned database is changeable by only one node included within the master cluster such that the communication between the first master node and the second master node occurs without acquiring a database lock;

    an act of assigning a worker agent role to the worker node, wherein the worker node, after being assigned the worker agent role, receives and processes workload assignments from the master cluster; and

    after waiting a predetermined time interval during which a status update from the worker agent role is not received, an act of the leader role communicating to the director role a failure of the worker agent role by recording the failure in the partitioned database, whereby the leader role communicates workload failures to the director role via the partitioned database.

View all claims
  • 2 Assignments
Timeline View
Assignment View
    ×
    ×