Coordinated garbage collection in distributed systems
First Claim
1. A system, comprising:
- a plurality of hardware computing nodes interconnected via a network, each of which comprises at least one processor and a memory, and each of which hosts one or more virtual machine instances, wherein each virtual machine instance has its own heap memory and garbage collector that are not shared with other virtual machine instances; and
a garbage collection coordinator;
wherein each of the virtual machine instances executes a respective process of a distributed application that communicates over the network with one or more other ones of the processes of the distributed application executing on respective other virtual machine instances, and wherein performing a garbage collection on one of the virtual machine instances hosted on a particular node of the plurality of computing nodes delays execution of one or more other of the virtual machine instances hosted on respective nodes of the plurality of computing nodes other than the particular node;
wherein the garbage collection coordinator is configured to;
receive information from at least one node of the plurality of hardware computing nodes indicating that the at least one node is not ready to receive communications from other ones of the processes of the distributed application;
determine that a particular garbage collection should be performed on one of the virtual machine instances dependent, at least in part, on the received information; and
initiate garbage collection on the one of the virtual machine instances.
1 Assignment
0 Petitions
Accused Products
Abstract
Fast modern interconnects may be exploited to control when garbage collection is performed on the nodes (e.g., virtual machines, such as JVMs) of a distributed system in which the individual processes communicate with each other and in which the heap memory is not shared. A garbage collection coordination mechanism (a coordinator implemented by a dedicated process on a single node or distributed across the nodes) may obtain or receive state information from each of the nodes and apply one of multiple supported garbage collection coordination policies to reduce the impact of garbage collection pauses, dependent on that information. For example, if the information indicates that a node is about to collect, the coordinator may trigger a collection on all of the other nodes (e.g., synchronizing collection pauses for batch-mode applications where throughput is important) or may steer requests to other nodes (e.g., for interactive applications where request latencies are important).
22 Citations
20 Claims
-
1. A system, comprising:
-
a plurality of hardware computing nodes interconnected via a network, each of which comprises at least one processor and a memory, and each of which hosts one or more virtual machine instances, wherein each virtual machine instance has its own heap memory and garbage collector that are not shared with other virtual machine instances; and a garbage collection coordinator; wherein each of the virtual machine instances executes a respective process of a distributed application that communicates over the network with one or more other ones of the processes of the distributed application executing on respective other virtual machine instances, and wherein performing a garbage collection on one of the virtual machine instances hosted on a particular node of the plurality of computing nodes delays execution of one or more other of the virtual machine instances hosted on respective nodes of the plurality of computing nodes other than the particular node; wherein the garbage collection coordinator is configured to; receive information from at least one node of the plurality of hardware computing nodes indicating that the at least one node is not ready to receive communications from other ones of the processes of the distributed application; determine that a particular garbage collection should be performed on one of the virtual machine instances dependent, at least in part, on the received information; and initiate garbage collection on the one of the virtual machine instances. - View Dependent Claims (2, 3, 4, 5, 6, 7, 8, 9)
-
-
10. A method, comprising:
performing, by a plurality of hardware computing nodes interconnected via a network; beginning execution of a distributed application, wherein respective processes of the distributed application executing on the plurality of hardware computing nodes communicate over the network with other processes of the distributed application executing on other ones of the plurality of hardware computing nodes, and wherein executing the distributed application comprises performing one or more particular operations that when performed on one of the plurality of computing nodes delays operations on one or more other ones of the plurality of computing nodes until it is complete; determining, during execution of the distributed application, that a given one of the plurality of computing nodes should perform one of the particular operations; receiving information from the given one of the plurality of computing nodes indicating that the given one of the plurality of computing nodes is not ready to receive communications from the other processes of the distributed application; and performing, by the given one of the plurality of computing nodes, the one of the particular operations; wherein said determining comprises applying a coordination policy that is dependent, at least in part, on the received information. - View Dependent Claims (11, 12, 13, 14, 15, 16)
-
17. A non-transitory, computer-readable storage medium storing program instructions that when executed on one or more computing nodes cause the one or more computing nodes to implement a garbage collection coordinator and an application programming interface for communication with the garbage collection coordinator;
-
wherein the garbage collection coordinator is configured to; receive information from each of one or more of a plurality of computing nodes on which a respective process of a distributed application is executing, wherein the respective process of the distributed application executing on each of the plurality of computing nodes communicates with one or more other ones of the respective processes executing on other ones of the computing nodes, and wherein the respective processes of the distributed application executing on each of the plurality of computing nodes do not share heap memory; and initiate a garbage collection on at least one of the plurality of computing nodes dependent on the information; wherein the information is usable to determine whether or not the computing node is ready to receive communications from other ones of the processes of the distributed application; wherein the information is received via an operation defined by the application programming interface; and wherein the garbage collection coordinator is configured to initiate the garbage collection via an operation defined by the application programming interface. - View Dependent Claims (18, 19, 20)
-
Specification