Use of temporarily available computing nodes for dynamic scaling of a cluster
First Claim
1. A computer-implemented method comprising:
- receiving, by one or more configured computing systems of a distributed program execution service, configuration information regarding execution of an indicated program in a distributed manner that includes executing a plurality of jobs of the indicated program;
determining, by the one or more configured computing systems, multiple computing nodes for use in a cluster to perform the execution of the indicated program in accordance with the received configuration information, the multiple computing nodes for the cluster including a first group of one or more computing nodes to act as core computing nodes that each participate in a distributed storage system storing information used in the execution of the indicated program, the multiple computing nodes further including a second group of one or more computing nodes to act as auxiliary computing nodes that do not participate in the distributed storage system, wherein at least one auxiliary computing node of the second group has temporary availability while not otherwise being used and is selected for use in the second group based at least in part on the temporary availability;
initiating, by the one or more configured computing systems, the execution of the indicated program in the distributed manner on the multiple computing nodes of the cluster by executing one or more of the plurality of jobs on each of the multiple computing nodes; and
during the execution of the indicated program by the multiple computing nodes of the cluster, receiving an indication that the at least one auxiliary computing node of the second group is to be used for a distinct first use that is not related to the execution of the indicated program, and initiating, by the one or more configured computing systems, removal from the cluster of the at least one auxiliary computing node in the second group while the execution of the indicated program continues, to enable the removed at least one auxiliary computing node to be available for the distinct first use.
1 Assignment
0 Petitions
Accused Products
Abstract
Techniques are described for managing distributed execution of programs, including by dynamically scaling a cluster of multiple computing nodes performing ongoing distributed execution of a program, such as to increase and/or decrease computing node quantity. An architecture may be used that has core nodes that each participate in a distributed storage system for the distributed program execution, and that has one or more other auxiliary nodes that do not participate in the distributed storage system. Furthermore, as part of performing the dynamic scaling of a cluster, computing nodes that are only temporarily available may be selected and used, such as computing nodes that might be removed from the cluster during the ongoing program execution to be put to other uses and that may also be available for a different fee (e.g., a lower fee) than other computing nodes that are available throughout the ongoing use of the cluster.
-
Citations
30 Claims
-
1. A computer-implemented method comprising:
-
receiving, by one or more configured computing systems of a distributed program execution service, configuration information regarding execution of an indicated program in a distributed manner that includes executing a plurality of jobs of the indicated program; determining, by the one or more configured computing systems, multiple computing nodes for use in a cluster to perform the execution of the indicated program in accordance with the received configuration information, the multiple computing nodes for the cluster including a first group of one or more computing nodes to act as core computing nodes that each participate in a distributed storage system storing information used in the execution of the indicated program, the multiple computing nodes further including a second group of one or more computing nodes to act as auxiliary computing nodes that do not participate in the distributed storage system, wherein at least one auxiliary computing node of the second group has temporary availability while not otherwise being used and is selected for use in the second group based at least in part on the temporary availability; initiating, by the one or more configured computing systems, the execution of the indicated program in the distributed manner on the multiple computing nodes of the cluster by executing one or more of the plurality of jobs on each of the multiple computing nodes; and during the execution of the indicated program by the multiple computing nodes of the cluster, receiving an indication that the at least one auxiliary computing node of the second group is to be used for a distinct first use that is not related to the execution of the indicated program, and initiating, by the one or more configured computing systems, removal from the cluster of the at least one auxiliary computing node in the second group while the execution of the indicated program continues, to enable the removed at least one auxiliary computing node to be available for the distinct first use. - View Dependent Claims (2, 3, 4, 5, 6, 7, 8, 9)
-
-
10. A non-transitory computer-readable medium having stored contents that configure a computing system to:
-
initiate, by the configured computing system, execution of an indicated program in a distributed manner on a cluster of multiple computing nodes at a first time, the initiating of the execution including executing one or more of a plurality of jobs of the indicated program on each of the multiple computing nodes; at a second time subsequent to the first time and while the execution of the indicated program is ongoing, determine, by the configured computing system, to modify the cluster in a manner based at least in part on use of one or more computing nodes that have temporary availability until a distinct use of a higher priority occurs for the one or more computing nodes, the cluster including at the second time a first group of the multiple computing nodes that each are part of distributed storage for use during the execution of the indicated program; and initiate, by the configured computing system, a change in the multiple computing nodes of the cluster while the execution of the indicated program is ongoing in response to the determining, the initiating of the change including selecting a second group of computing nodes of the cluster that are not part of the distributed storage and including performing a modification to the second group corresponding to the use of the one or more computing nodes having temporary availability, wherein the use of the one or more computing nodes having temporary availability is based at least in part on the temporary availability of the one or more computing nodes. - View Dependent Claims (11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25)
-
-
26. A computing system, comprising:
-
one or more processors; and one or more modules that are configured to, when executed by at least one of the one or more processors, manage distributed execution of an indicated program for a client by; receiving information from the client regarding execution of the indicated program; initiating the execution of the indicated program in a distributed manner on a cluster of multiple computing nodes at a first time in accordance with the received information, the initiating of the execution including attempting to execute one or more of a plurality of jobs of the indicated program on each of one or more of the multiple computing nodes; at a second time subsequent to the first time and while the execution of the indicated program is ongoing, determining to modify a quantity of first computing nodes in the cluster that each execute one or more of the plurality of jobs of the indicated program without being part of distributed storage for information used for the ongoing execution of the indicated program, the first computing nodes being distinct from at least some other computing nodes in the cluster at the second time that each are part of the distributed storage; and initiating a change in the quantity of the first computing nodes of the cluster while the execution of the indicated program is ongoing and in response to the determining, wherein the initiated change in the quantity of the first computing nodes corresponds to use of one or more computing nodes that have temporarily availability for use in the cluster until a distinct use occurs for the one or more computing nodes, and wherein the use of the one or more computing nodes having temporary availability is based at least in part on the temporary availability of the one or more computing nodes. - View Dependent Claims (27, 28, 29, 30)
-
Specification