High performance computing system for distributed applications over a computer
First Claim
1. A method for solving consensus based problems in a distributed computer network, the distributed computer network having at least one distributed consensus client process proposing a consensus based problem, and at least one distributed consensus server process for solving the consensus based problem, the method comprising the steps of:
- transporting communication messages among distributed processes through an event bus;
using at least one priority failure detector to detect failed processes based on heartbeat status messages, wherein using the priority failure detector includes;
assigning all processes in the consensus session with a set of Predetermined priority levels;
adjusting a frequency of a periodic heartbeat status message sent by each prioritized process based on its assigned Priority level, wherein an amount of traffic on the event bus may be minimized by adjusting the frequency; and
detecting a failure of the prioritized process based on the frequency controlled heartbeat status message; and
achieving a universal consensus result by providing a distributed consensus service through an asynchronous consensus session with the assistance of the event bus and the priority failure detector, wherein any failure in a single process in the distributed computer network does not affect any other process, thus providing a fault tolerant and reliable network mechanism.
7 Assignments
0 Petitions
Accused Products
Abstract
A system that includes one or more priority failure detectors may be included that detect node or process failures in the distributed computer network. The system has a fault-tolerant, client-server architecture where a client process presents a particular consensus problem to one or more server processes to solve such a consensus based problem. The system assigns priority levels to processes involved in a consensus session and controls the frequencies of their heartbeat status messages based on their respective priority levels. By controlling the frequencies, the reliability of the network is enhanced and the overall message load on the even bus is reduced to a minimum number. The system also discloses a name service that assigns unique logical identities to all processes in a consensus session. Further, by tagging all involved processes appropriately, multiple consensus based problems can be dealt with on a set of consensus server processes simultaneously.
-
Citations
51 Claims
-
1. A method for solving consensus based problems in a distributed computer network, the distributed computer network having at least one distributed consensus client process proposing a consensus based problem, and at least one distributed consensus server process for solving the consensus based problem, the method comprising the steps of:
-
transporting communication messages among distributed processes through an event bus;
using at least one priority failure detector to detect failed processes based on heartbeat status messages, wherein using the priority failure detector includes;
assigning all processes in the consensus session with a set of Predetermined priority levels;
adjusting a frequency of a periodic heartbeat status message sent by each prioritized process based on its assigned Priority level, wherein an amount of traffic on the event bus may be minimized by adjusting the frequency; and
detecting a failure of the prioritized process based on the frequency controlled heartbeat status message; and
achieving a universal consensus result by providing a distributed consensus service through an asynchronous consensus session with the assistance of the event bus and the priority failure detector, wherein any failure in a single process in the distributed computer network does not affect any other process, thus providing a fault tolerant and reliable network mechanism. - View Dependent Claims (2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 51)
registering all related distributed consensus server processes with the name service upon initiation of the distributed consensus service; and
assigning the unique logical identity to each distributed consensus server process only after all the distributed consensus server processes have completed the registration.
-
-
4. The method of claim 1 wherein the step of adjusting further includes the step of assigning a high frequency for a heartbeat status message for a process of a high priority level so that reliability of the distributed computer network is enhanced.
-
5. The method of claim 1 wherein the step of adjusting further includes the step of assigning a low frequency for a heartbeat status message for a process of a low priority level so that the overall message communication on the event bus can be minimized.
-
6. The method of claim 1 wherein the event bus offers different Quality-of-Services for different consensus based protocols based on a needed reliability level of the communication.
-
7. The method of claim 1 wherein the step of achieving a universal consensus result further includes the steps of:
-
initiating a consensus problem from a client process by sending a proposed value to a group of related processes with the assistance of at least one agreement protocol;
filtering the consensus problem in order to customize the consensus service in a run time fashion;
solving the consensus problem by all consensus server processes;
passing the consensus result solved to the agreement protocol; and
distributing the consensus result to all related processes of the agreement protocol, wherein the consensus server processes are detached from any client process through the step of filtering.
-
-
8. The method of claim 7 wherein a predetermined set of client processes communicate only with a designated consensus server process.
-
9. The method of claim 7 wherein a predetermined set of client processes communicate with a first designated consensus server process and a second designated consensus server process, wherein the second designated consensus server process is a backup server process for the first designated consensus server process.
-
10. The method of claim 7 further includes a consensus coordinator that manages all related processes for solving the consensus problems.
-
11. The method of claim 7 wherein the consensus coordinator manages multiple consensus problems concurrently.
-
51. The method of claim 1 wherein using the priority failure detector further comprises:
-
placing the prioritized process for which the failure was detected on a suspect list;
removing the prioritized process from the suspect list after a predetermined period of time elapses during which no additional failures of the prioritized process were detected; and
extending a timeout period associated with the prioritized process if no additional failures of the prioritized process were detected, wherein the timeout period is used in detecting a failure of the prioritized process.
-
-
12. A method for locking two or more distributed objects using consensus based protocols in a distributed computer network, the method comprising the steps of:
-
broadcasting a request to at least one resource object for a lock on one or more proposed objects by an initiating distributed object using a consensus based protocol; and
deciding on the proposed objects through a consensus service session for the resource objects to grant the lock to the one or more proposed objects in a predetermined order, wherein any failure in a single resource object in the distributed computer network does not affect the other objects, thus providing a fault tolerant and reliable mechanism. - View Dependent Claims (13, 14, 15)
-
-
16. A method for detecting process failures in a distributed computer network, the method comprising the steps of:
-
requiring a heartbeat status message to be sent periodically from a first process to a second process;
assigning a priority level to the first process in the distributed computer network;
controlling the frequency of the heartbeat status message of the first process according to its priority level; and
detecting any process failure of the first process by the second process based on the reception of the heartbeat status message. - View Dependent Claims (17, 18, 19, 20, 21)
increasing the frequency of the heartbeat status message if the first process has a high priority level; and
reducing the frequency of the heartbeat status message if the first process has a low priority level.
-
-
20. The method of claim 16 wherein the heartbeat status messages can be any application messages.
-
21. The method of claim 16 wherein the step of controlling further includes a step of controlling the frequency of the heartbeat status message based on the reliability of both the first process and the network.
-
22. A method for solving multiple consensus based problems concurrently in a distributed computer network, the distributed computer network having one or more distributed consensus client processes proposing one or more consensus based problems, and one or more distributed consensus server processes for solving the consensus based problems concurrently, the method comprising the steps of:
-
transporting communication messages among distributed processes through an event bus;
using one or more priority failure detectors to detect failed processes based on their predetermined priority levels using heartbeat status messages, wherein each of the heartbeat status messages is associated with a frequency based on an assigned priority level of the heartbeat status message, so that an amount of traffic on the event bus may be minimized by adjusting the frequency of each heartbeat status message; and
solving the consensus based problems by providing a distributed consensus service through one or more asynchronous consensus sessions with the assistance of the event bus and the priority failure detectors, wherein the client and server processes are tagged for their respective consensus sessions, and any failure in a single process in the distributed computer network does not affect any other process, thus providing a fault tolerant and reliable mechanism. - View Dependent Claims (23, 24, 25)
-
-
26. A system for solving consensus based problems in a distributed computer network, the distributed computer network having at least one distributed consensus client process proposing a consensus based problem, and at least one distributed consensus server process for solving the consensus based problem, the system comprising the steps of:
-
an event bus for transporting communication messages among distributed processes;
at least one priority failure detector for detecting failed processes based on prioritized status messages, wherein the priority failure detector includes;
instructions for assigning all processes in the consensus session with a set of predetermined priority levels;
instructions for adjusting a frequency of a periodic heartbeat status message sent by each prioritized process based on its assigned priority level; and
instructions for detecting a failure of the prioritized process based on the frequency controlled heartbeat status message; and
a distributed consensus service for achieving a universal consensus result through an asynchronous consensus session with the assistance of the event bus and the at least one priority failure detector, wherein any failure in a single process in the distributed computer network does not affect any other process, thus providing a fault tolerant and reliable mechanism. - View Dependent Claims (27, 28, 29, 30, 31, 32, 33, 34, 35, 36)
instructions for registering all related distributed consensus server processes with the name service upon initiation of the distributed consensus service; and
instructions for assigning the unique logical identity to each distributed consensus server process only after all the distributed consensus server processes have completed the registration.
-
-
29. The system of claim 26 wherein the instructions for adjusting further includes the instructions for assigning a high frequency for a heartbeat status message for a process of a high priority level so that reliability of the distributed computer network is enhanced.
-
30. The system of claim 26 wherein the instructions for adjusting further includes the instructions for assigning a low frequency for a heartbeat status message for a process of a low priority level so that the overall message communication on the event bus can be minimized.
-
31. The system of claim 26 wherein the event bus offers different Quality-of-Services for different consensus based protocols based on a needed reliability of the communication.
-
32. The system of claim 26 wherein the distributed consensus service further includes the instructions for:
-
initiating a consensus problem from a client process by sending a proposed value to a group of related processes with the assistance of at least one agreement protocol;
filtering the consensus problem in order to customize the consensus service in a run time fashion;
solving the consensus problem by all consensus server processes;
passing the consensus result solved to the agreement protocol; and
distributing the consensus result to all related processes of the agreement protocol, wherein the consensus server processes are detached from any client process through the step of filtering.
-
-
33. The system of claim 32 wherein a predetermined set of client processes communicate only with a designated consensus server process.
-
34. The system of claim 32 wherein a predetermined set of client processes communicate with a first designated consensus server process and a second designated consensus server process, wherein the second designated consensus server process is a backup server process for the first designated consensus server process.
-
35. The system of claim 32 wherein the distributed consensus service further includes a consensus coordinator that manages all related processes for solving the consensus problems.
-
36. The system of claim 32 wherein the consensus coordinator manages multiple consensus problems concurrently.
-
37. A system for locking two or more distributed objects using consensus based protocols in a distributed computer network, the system comprising the instructions for:
-
broadcasting a request to at least one resource object for a lock on at least one proposed object by an initiating distributed object using a consensus based protocol; and
deciding on the proposed object through a consensus service session for the resource objects to grant the lock to the at least one proposed object in a predetermined order, wherein any failure in a single resource object in the distributed computer network does not affect the other objects, thus providing a fault tolerant and reliable mechanism. - View Dependent Claims (38, 39, 40)
-
-
41. A system for detecting process failures in a distributed computer network, the system comprising:
-
a heartbeat status message sent periodically from a first process to a second process;
instructions for assigning a priority level to the first process in the distributed computer network;
instructions for controlling the frequency of the heartbeat status message of the first process according to its priority level; and
instructions for detecting any process failure of the first process by the second process based on the reception of the heartbeat status message. - View Dependent Claims (42, 43, 44, 45, 46)
increasing the frequency of the heartbeat status message if the first process has a high priority level; and
reducing the frequency of the heartbeat status message if the first process has a low priority level.
-
-
45. The system of claim 41 wherein the heartbeat status messages can be any application messages.
-
46. The system of claim 41 wherein the instructions for controlling further includes the instructions for controlling the frequency of the heartbeat status message based on the reliability of both the first process and the network.
-
47. A system for solving multiple consensus based problems concurrently in a distributed computer network, the distributed computer network having one or more distributed consensus client processes proposing one or more consensus based problems, and one or more distributed consensus server processes for solving the consensus based problems concurrently, the system comprising:
-
an event bus for transporting communication messages among distributed processes;
one or more priority failure detectors to detect failed processes based on their predetermined priority levels using heartbeat status messages, wherein each of the heartbeat status messages is associated with a frequency based on an assigned priority level of the heartbeat status message, so that an amount of traffic on the event bus may be minimized by adjusting the frequency of each heartbeat status message; and
a distributed consensus service for solving the consensus based problems through one or more asynchronous consensus sessions with the assistance of the event bus and the priority failure detectors, wherein the client and server processes are tagged for their respective consensus sessions, and any failure in a single process in the distributed computer network does not affect any other process, thus providing a fault tolerant and reliable mechanism. - View Dependent Claims (48, 49, 50)
-
Specification