Dynamic application instance discovery and state management within a distributed system
First Claim
1. A distributed system, comprising:
- a plurality of computing devices configured to implement;
a plurality of application instances configured to perform functions of the distributed system, wherein the plurality of application instances includes two or more different types of application instances, each type of application instance configured to perform one or more different functions of the distributed system; and
a plurality of discovery and failure detection daemon (DFDD) instances, wherein the plurality of DFDD instances are configured to store operational state information for the plurality of application instances, wherein the state information includes global state information common to all types of application instances and specific state information specific to at least one type of application instance and wherein at least one of the DFDD instances is configured to update the global state information according to a global state machine defining transitions between a plurality of global states including a state indicating the respective application instance is newly online, a state indicating the respective application instance is operating normally, a state indicating the respective application instance has lost communication with a respective DFDD instance, a state indicating the respective application instance has failed, and a state indicating the respective application instance is subject to a network split, according to one or more status reports received from the respective application instance;
wherein at least one of the plurality of DFDD instances is configured to repeatedly execute a peer-to-peer, gossip-based synchronization protocol with a peer instance of the DFDD instances, wherein the peer instance is randomly or pseudorandomly selected from among the plurality of DFDD instances, and wherein to execute the protocol, the peer DFDD instances are configured to exchange state information for at least one of the plurality of application instances including both the global state information and the specific state information.
0 Assignments
0 Petitions
Accused Products
Abstract
Dynamic application instance discovery and state management within a distributed system. A distributed system may implement application instances configured to perform one or more application functions within the distributed system, and discovery and failure detection daemon (DFDD) instances, each configured to store an indication of a respective operational state of each member of a respective group of the number of application instances. Each of the DFDD instances may repeatedly execute a gossip-based synchronization protocol with another one of the DFDD instances, where execution of the protocol between DFDD instances includes reconciling differences among membership of the respective groups of application instances. A new application instance may be configured to notify a particular DFDD instance of its availability to perform an application function. The particular DFDD instance may be configured to propagate the new instance'"'"'s availability to other DFDD instances via execution of the synchronization protocol, without intervention on the part of the new application instance.
-
Citations
20 Claims
-
1. A distributed system, comprising:
a plurality of computing devices configured to implement; a plurality of application instances configured to perform functions of the distributed system, wherein the plurality of application instances includes two or more different types of application instances, each type of application instance configured to perform one or more different functions of the distributed system; and a plurality of discovery and failure detection daemon (DFDD) instances, wherein the plurality of DFDD instances are configured to store operational state information for the plurality of application instances, wherein the state information includes global state information common to all types of application instances and specific state information specific to at least one type of application instance and wherein at least one of the DFDD instances is configured to update the global state information according to a global state machine defining transitions between a plurality of global states including a state indicating the respective application instance is newly online, a state indicating the respective application instance is operating normally, a state indicating the respective application instance has lost communication with a respective DFDD instance, a state indicating the respective application instance has failed, and a state indicating the respective application instance is subject to a network split, according to one or more status reports received from the respective application instance; wherein at least one of the plurality of DFDD instances is configured to repeatedly execute a peer-to-peer, gossip-based synchronization protocol with a peer instance of the DFDD instances, wherein the peer instance is randomly or pseudorandomly selected from among the plurality of DFDD instances, and wherein to execute the protocol, the peer DFDD instances are configured to exchange state information for at least one of the plurality of application instances including both the global state information and the specific state information. - View Dependent Claims (2, 3, 4, 14, 15, 16)
-
5. A method, comprising:
-
storing, by a plurality of discovery and failure detection daemon (DFDD) instances implemented on a plurality of computing devices, state information for a plurality of application instances configured to perform functions of a distributed system, wherein the plurality of application instances includes two or more different types of application instances, each type of application instance configured to perform one or more different functions of the distributed system, wherein the state information includes global state information common to all types of application instances and specific state information specific to at least one type of application and wherein the state information includes global state information according to a global state machine defining transition between a plurality of global states including a state indicating a respective application instance is newly online, a state indicating the respective application instance is operating normally, a state indicating the respective application instance has lost communication with a respective DFDD instance, a state indicating the respective application instance has failed, and a state indicating the respective application instance is subject to a network split, according to status reports of the application instance; randomly or pseudorandomly selecting, by at least one of the plurality of DFDD instances, a peer instance of the plurality of DFDD instances; and communicating, by the at least one of the plurality of DFDD instances, state information for one or more of the plurality of application instances to the other DFDD instance according to a peer-to-peer synchronization protocol, the state information including both the global state information and the specific state information. - View Dependent Claims (6, 7, 8, 9, 10, 11, 12, 17)
-
-
13. A non-transitory computer-accessible storage medium storing instructions that when executed by a computer implement a discovery and failure detection daemon (DFDD) configured to:
-
store operational state information for at least one of a plurality of application instances configured to perform functions of a distributed system, wherein the plurality of application instances includes two or more different types of application instances, each type of application instance configured to perform one or more different functions of the distributed system, wherein the state information includes global state information common to all types of application instances and specific state information specific to at least one type of application and wherein the state information includes global state information according to a global state machine defining transition between a plurality of global states including a state indicating a respective application instance is newly online, a state indicating the respective application instance is operating normally, a state indicating the respective application instance has lost communication with a respective DFDD instance, a state indicating the respective application instance has failed, and a state indicating the respective application instance is subject to a network split, according to status reports of the respective application instance; and iteratively perform; randomly or pseudorandomly select a peer DFDD instance from among a plurality of DFDD instances; and execute a peer-to-peer, gossip-based protocol with the other peer instances of the DFDD instances, wherein to execute the protocol, the peer DFDD instances are configured to exchange state information for at least one of the plurality of application instances in the distributed system, the state information including both the global state information and the specific state information. - View Dependent Claims (18, 19, 20)
-
Specification