Fault tolerant federation of computing clusters

US 10,177,994 B2
Filed: 08/13/2014
Issued: 01/08/2019
Est. Priority Date: 08/13/2014
Status: Active Grant

First Claim

Patent Images

1. At a computer system that includes at least one processor, a computer-implemented method for facilitating communication and maximizing an efficiency of directed work flow between computing nodes in a cluster federation, the method comprising:

an act of identifying a plurality of computing nodes that are to be a part of the cluster federation, the cluster federation including a master cluster and a worker cluster, wherein the master cluster includes a first master node and a second master node, and wherein the worker cluster includes a worker node;

an act of assigning a director role to the first master node, which first master node is included in the master cluster, wherein the first master node, after being assigned the director role, governs decisions that affect consistency within the cluster federation;

an act of assigning a leader role to the second master node, which second master node is also included in the master cluster, wherein the second master node, after being assigned the leader role, monitors and controls the worker node in the worker cluster, whereby the master cluster includes the first master node having the director role and the second master node having the leader role, the first master node being different than the second master node, the director role being different than the leader role;

an act of maintaining a partitioned database that is usable to facilitate communication between the first master node and the second master node, wherein any particular entry in the partitioned database is changeable by only one node included within the master cluster such that the communication between the first master node and the second master node occurs without acquiring a database lock;

an act of assigning a worker agent role to the worker node, wherein the worker node, after being assigned the worker agent role, receives and processes workload assignments from the master cluster; and

after waiting a predetermined time interval during which a status update from the worker agent role is not received, an act of the leader role communicating to the director role a failure of the worker agent role by recording the failure in the partitioned database, whereby the leader role communicates workload failures to the director role via the partitioned database.

View all claims

2 Assignments

Timeline View

Assignment View

0 Petitions

Accused Products

Abstract

Embodiments are directed to organizing computing nodes in a cluster federation and to reassigning roles in a cluster federation. In one scenario, a computer system identifies computing nodes that are to be part of a cluster federation which includes a master cluster and worker clusters. The computer system assigns a director role to a master node in the master cluster which governs decisions that affect consistency within the federation, and further assigns a leader role to at least one master node which monitors and controls other master nodes in the master cluster. The computer system assigns a worker agent role to a worker node which receives workload assignments from the master cluster, and further assigns a worker role to a worker node which processes the assigned workload. The organized cluster federation provides fault tolerance by allowing roles to be dynamically reassigned to computing nodes in different master and worker clusters.

Citations

21 Claims

1. At a computer system that includes at least one processor, a computer-implemented method for facilitating communication and maximizing an efficiency of directed work flow between computing nodes in a cluster federation, the method comprising:
- an act of identifying a plurality of computing nodes that are to be a part of the cluster federation, the cluster federation including a master cluster and a worker cluster, wherein the master cluster includes a first master node and a second master node, and wherein the worker cluster includes a worker node;
  
  an act of assigning a director role to the first master node, which first master node is included in the master cluster, wherein the first master node, after being assigned the director role, governs decisions that affect consistency within the cluster federation;
  
  an act of assigning a leader role to the second master node, which second master node is also included in the master cluster, wherein the second master node, after being assigned the leader role, monitors and controls the worker node in the worker cluster, whereby the master cluster includes the first master node having the director role and the second master node having the leader role, the first master node being different than the second master node, the director role being different than the leader role;
  
  an act of maintaining a partitioned database that is usable to facilitate communication between the first master node and the second master node, wherein any particular entry in the partitioned database is changeable by only one node included within the master cluster such that the communication between the first master node and the second master node occurs without acquiring a database lock;
  
  an act of assigning a worker agent role to the worker node, wherein the worker node, after being assigned the worker agent role, receives and processes workload assignments from the master cluster; and
  
  after waiting a predetermined time interval during which a status update from the worker agent role is not received, an act of the leader role communicating to the director role a failure of the worker agent role by recording the failure in the partitioned database, whereby the leader role communicates workload failures to the director role via the partitioned database.
- View Dependent Claims (2, 3, 4, 5, 6, 7, 8, 9, 10, 11)
- - 2. The method of claim 1, wherein the master cluster comprises a plurality of master agents running on one or more selected worker clusters.
  - 3. The method of claim 1, wherein the worker node is configured to store assigned workloads such that the worker node continues to operate even when the master cluster is unavailable.
  - 4. The method of claim 1, wherein one or more workload assignments are hosted on the second master node in the master cluster.
  - 5. The method of claim 1, wherein a functionality of at least some nodes that are included within the master cluster is distributed among a plurality of master nodes in the master cluster.
  - 6. The method of claim 5, wherein the functionality is distributed along physical hardware boundaries to reduce network traffic between computing systems.
  - 7. The method of claim 1, wherein the worker node, after being assigned the worker agent role, communicates with a master agent on behalf of the worker cluster, and wherein the communication between the worker node and the master agent includes providing workload assignment status updates and receiving additional workload assignments.
  - 8. The method of claim 1, wherein the cluster federation allows master nodes to host worker agents and further allows worker nodes to host master agents.
  - 9. The method of claim 1, wherein worker nodes in the worker cluster transfer assigned workloads between other worker nodes in the worker cluster, such that inter-cluster workload transfers are performed without the master node having knowledge of the inter-cluster workload transfers.
  - 10. The method of claim 1, further comprising:
    - an act of determining that the first master node, after being assigned the director role, has become unavailable; and
      
      an act of failing over to a third master node in the master cluster, such that the director role is transferred to and run on the third master node.
  - 11. The method of claim 1, wherein the first master node, after being assigned the director role, determines whether sufficient master or worker nodes are available in the cluster federation to maintain a fault tolerance service level agreement (SLA).

12. At a computer system that includes at least one processor, a computer-implemented method for facilitating communication and maximizing an efficiency when reassigning roles in a cluster federation, the method comprisingan act of identifying a cluster federation that includes a master cluster and a worker cluster, wherein the master cluster includes a plurality of master nodes that each have corresponding master node functionalities, and wherein the worker cluster includes a worker node, wherein a first master node and a second master node are included in the plurality of master nodes, the first master node being assigned a director role, the second master node being assigned a leader role, whereby the master cluster includes both the first master node having the director role and the second master node having the leader role, the first master node being different than the second master node, the director role being different than the leader role;
- an act of maintaining a partitioned database that is usable to facilitate communication between at least two master nodes included within the plurality, wherein any particular entry in the partitioned database is changeable by only one master node included within the plurality such that the communication between the at least two master nodes occurs without acquiring a database lock;
  
  an act of setting a policy requirement for the cluster federation, wherein the policy requirement at least requires that the cluster federation maintain a specified number of master nodes in the master cluster;
  
  an act of determining that a current number of master nodes included within the master cluster is below the specified number of master nodes such that the cluster federation is not meeting the policy requirement;
  
  an act of determining that the worker node is available for reassignment;
  
  an act of reassigning the worker node to become a new master node, such that the worker node, after being reassigned to become the new master node, adopts the master node functionalities;
  
  an act of the new master node transmitting a workload assignment to a worker agent role in the worker cluster; and
  
  after waiting a predetermined time interval during which a status update from the worker agent role is not received, an act of the new master node communicating a failure of the worker agent role by recording the failure in the partitioned database, whereby the new master node communicates workload failures via the partitioned database.
- View Dependent Claims (13, 14, 15, 16)
- - 13. The method of claim 12, further comprising:
    - an act of determining that the worker node, after being reassigned to become the new master node, has become unhealthy; and
      
      an act of demoting the new master node to become a new worker node.
  - 14. The method of claim 12, wherein the worker node is selected from a plurality of worker nodes that are within a same hardware rack as the first and second master nodes.
  - 15. The method of claim 12, wherein determining that the current number of master nodes included within the master cluster is below the specified number of master nodes comprises determining that the current number of master nodes is lower than a number that is specified by a service level agreement (SLA), and wherein the reassignment of the worker node to become the new master node complies with requirements of the SLA.
  - 16. The method of claim 12, wherein the master cluster is geographically distributed over a plurality of federation clusters.

17. A computer system comprising the following:
- one or more processors; and
  
  one or more computer-readable storage media having stored thereon computer-executable instructions that are executable by the one or more processors and that cause the computer system to perform a method for facilitating communication and maximizing an efficiency when reassigning roles in a cluster federation, the method comprising the following;
  
  an act of identifying a cluster federation that includes a master cluster and a worker cluster, wherein the master cluster includes a plurality of master nodes that each include corresponding master node functionalities, and wherein the worker cluster includes a worker node with corresponding worker node functionalities, wherein a first master node and a second master node are included in the plurality of master nodes, the first master node being assigned a director role, the second master node being assigned a leader role, whereby the master cluster includes both the first master node having the director role and the second master node having the leader role, the first master node being different than the second master node, the director role being different than the leader role;
  
  an act of maintaining a partitioned database that is usable to facilitate communication between at least two master nodes included within the plurality, wherein any particular entry in the partitioned database is changeable by only one master node included within the plurality such that the communication between the at least two master nodes occurs without acquiring a database lock;
  
  an act of setting a policy requirement for the cluster federation, wherein the policy requirement at least requires that the cluster federation maintain a specified number of worker nodes in the worker cluster;
  
  an act of determining that a current number of worker nodes included within the worker cluster is below the specified number of worker nodes such that the cluster federation is not meeting the policy requirement;
  
  an act of determining that at least one master node in the plurality of master nodes is available for reassignment;
  
  an act of reassigning the at least one master node to become a new worker node, such that the at least one master node, after being reassigned to become the new worker node, adopts the worker node functionalities;
  
  an act of a different master node transmitting a workload assignment to the new worker node; and
  
  after waiting a predetermined time interval during which a status update from the new worker node is not received, an act of the different master node communicating a failure of the new worker node by recording the failure in the partitioned database, whereby the different master node communicates workload failures via the partitioned database.
- View Dependent Claims (18, 19, 20, 21)
- - 18. The computer system of claim 17, wherein the method further comprises:
    - an act of implementing a plurality of fault tolerant leader agents that are configured to make independent decisions regarding node promotion and demotion within a scope of decisions that a particular leader agent is responsible for.
  - 19. The computer system of claim 17, wherein the method further comprises:
    - an act of implementing one or more master agents that are configured to record promotion and demotion decisions in the partitioned database.
  - 20. The computer system of claim 17, wherein determining that the at least one master node is available for reassignment comprises:
    - identifying that the at least one master node has available processing resources such that the at least one master node is characterized as being light on work.
  - 21. The computer system of claim 17, wherein the master cluster is configured to issue a notification upon detecting that one or more master nodes in the master cluster are undergoing a down event or an up event, the notification being issued by the master cluster and being delivered to the first master node that was assigned the director role.

Specification

Resources

Litigation Campaign Assessment

Current Assignee
Microsoft Technology Licensing LLC (Microsoft Corporation)
Original Assignee
Microsoft Technology Licensing LLC (Microsoft Corporation)
Inventors
Nishanov, Gor, D'Amato, Andrea, Dion, David Allen, Tamhane, Amitabh Prakash, Koppolu, Lokesh Srinivas, Maliwacki, Nicholas
Primary Examiner(s)
Dollinger, Tonia L
Assistant Examiner(s)
Sayoc, Kristoffer L S

Application Number

US14/459,066
Publication Number

US 20160050123A1
Time in Patent Office

1,609 Days
Field of Search

709223, 709208, 709209
US Class Current
CPC Class Codes

G06F 9/5061   Partitioning or combining o...

H04L 41/5003   Managing SLA; Interaction b...

H04L 67/10   in which an application is ...

Fault tolerant federation of computing clusters

First Claim

2 Assignments

0 Petitions

Accused Products

Abstract

Citations

21 Claims

Specification

Solutions

Use Cases

Quick Links

Fault tolerant federation of computing clusters

First Claim

2 Assignments

Subscription Required

Subscription Required

0 Petitions

Subscription Required

Accused Products

Subscription Required

Abstract

Citations

21 Claims

Specification

Subscription Required

Solutions

Use Cases

Quick Links