Dynamic scaling of a cluster of computing nodes used for distributed execution of a program
First Claim
1. A computer-implemented method comprising:
- receiving configuration information regarding execution of an indicated program in a distributed manner that includes executing a plurality of jobs of the indicated program, the configuration information being received by one or more computing systems configured to provide a distributed program execution service that performs distributed execution of programs for clients using a plurality of configurable computing nodes;
automatically determining a subset of multiple of the plurality of configurable computing nodes to be used for the execution of the indicated program in accordance with the received configuration information, the multiple computing nodes of the determined subset including a first group of one or more computing nodes to act as core computing nodes that each participate in a distributed storage system storing information used in the execution of the indicated program, the multiple computing nodes of the determined subset further including a second group of one or more computing nodes to act as auxiliary computing nodes that do not participate in the distributed storage system, the automatic determining of the subset being performed by the one or more configured computing systems;
initiating the execution of the indicated program in the distributed manner on the multiple computing nodes at a first time by executing one or more of the plurality of jobs of the indicated program on each of the multiple computing nodes, the initiating of the execution of the indicated program being performed by the one or more configured computing systems; and
at a second time subsequent to the first time and during the execution of the indicated program, determining to reduce a quantity of the multiple computing nodes that are used for ongoing execution of the indicated program, and automatically reducing the quantity of the multiple computing nodes by initiating removal of at least one of the auxiliary computing nodes of the second group without removing any of the core computing nodes of the first group, the automatic reducing of the quantity of the multiple computing nodes being performed by the one or more configured computing systems.
1 Assignment
0 Petitions
Accused Products
Abstract
Techniques are described for managing distributed execution of programs, including by dynamically scaling a cluster of multiple computing nodes used to perform ongoing distributed execution of a program, such as to increase and/or decrease the quantity of computing nodes in the cluster at various times and for various reasons. An architecture may be used that facilitates the dynamic scaling of a cluster, including by having at least some of the computing nodes act as core nodes that each participate in a distributed storage system for the distributed program execution, and having one or more other computing nodes that act as auxiliary nodes that do not participate in the distributed storage system. If computing nodes are selected to be removed from the cluster during ongoing distributed execution of a program, one or more nodes of the auxiliary computing node type may be selected for the removal.
-
Citations
24 Claims
-
1. A computer-implemented method comprising:
-
receiving configuration information regarding execution of an indicated program in a distributed manner that includes executing a plurality of jobs of the indicated program, the configuration information being received by one or more computing systems configured to provide a distributed program execution service that performs distributed execution of programs for clients using a plurality of configurable computing nodes; automatically determining a subset of multiple of the plurality of configurable computing nodes to be used for the execution of the indicated program in accordance with the received configuration information, the multiple computing nodes of the determined subset including a first group of one or more computing nodes to act as core computing nodes that each participate in a distributed storage system storing information used in the execution of the indicated program, the multiple computing nodes of the determined subset further including a second group of one or more computing nodes to act as auxiliary computing nodes that do not participate in the distributed storage system, the automatic determining of the subset being performed by the one or more configured computing systems; initiating the execution of the indicated program in the distributed manner on the multiple computing nodes at a first time by executing one or more of the plurality of jobs of the indicated program on each of the multiple computing nodes, the initiating of the execution of the indicated program being performed by the one or more configured computing systems; and at a second time subsequent to the first time and during the execution of the indicated program, determining to reduce a quantity of the multiple computing nodes that are used for ongoing execution of the indicated program, and automatically reducing the quantity of the multiple computing nodes by initiating removal of at least one of the auxiliary computing nodes of the second group without removing any of the core computing nodes of the first group, the automatic reducing of the quantity of the multiple computing nodes being performed by the one or more configured computing systems. - View Dependent Claims (2, 3, 4, 5, 6, 7, 8, 9, 10)
-
-
11. A non-transitory computer-readable medium whose contents configure a computing system to perform a method, the method comprising:
-
initiating execution of an indicated program in a distributed manner on a cluster of multiple computing nodes at a first time; at a second time subsequent to the first time and while the execution of the indicated program is ongoing, determining, by the configured computing system, to modify a quantity of the multiple computing nodes in the cluster during the ongoing execution of the indicated program; identifying, by the configured computing system, a first subset of the multiple computing nodes that each are part of distributed storage at the second time for the ongoing execution of the indicated program, and a second subset of one or more of the multiple computing nodes that are distinct from the multiple computing nodes of the first subset and that are not part of the distributed storage at the second time; and initiating, by the configured computing system, a change in the quantity of the multiple computing nodes of the cluster while the execution of the indicated program is ongoing in response to the determining to modify the quantity, the initiated change in the quantity being performed for one of the first subset and of the second subset that is selected for the initiated change based at least in part on a type of the determined modifying of the quantity, the initiating of the change being performed by the configured computing system. - View Dependent Claims (12, 13, 14, 15)
-
-
16. A computing system, comprising:
-
one or more processors; and one or more modules of a distributed execution service that are configured to, when executed by at least one of the one or more processors, manage distributed execution of programs for clients by, for each of one or more of the clients; receiving information from the client regarding execution of an indicated program in a distributed manner on a cluster of multiple computing nodes; initiating the execution of the indicated program in the distributed manner on the cluster of multiple computing nodes at a first time; at a second time subsequent to the first time and while the execution of the indicated program is ongoing, determining to modify a quantity of first computing nodes in the cluster that are not part of distributed storage for information used for the ongoing execution of the indicated program, the first computing nodes being distinct from at least some computing nodes in the cluster at the second time that each are part of the distributed storage; and initiating a change in the quantity of the first computing nodes of the cluster while the execution of the indicated program is ongoing and in response to the determining to modify the quantity. - View Dependent Claims (17, 18, 19, 20, 21, 22, 23, 24)
-
Specification