Allocating resources for multi-phase, distributed computing jobs
First Claim
Patent Images
1. A method comprising:
- receiving, at a computing device, data indicative of the size of an intermediate data set generated by a first resource device;
associating the intermediate data set with a virtual machine to process the intermediate data set;
determining a virtual machine configuration based on the size of the intermediate data set;
selecting a second resource device to execute the virtual machine based on the virtual machine configuration and on an available bandwidth between the first and second resource devices; and
assigning the virtual machine to the second resource device to process the intermediate data set, wherein the intermediate data set is generated by a mapper task executed within a virtual machine on the first resource device and the virtual machine assigned to the second resource device executes a reducer task to process the intermediate data set.
1 Assignment
0 Petitions
Accused Products
Abstract
In one embodiment, data indicative of the size of an intermediate data set generated by a first resource device is received at a computing device. The intermediate data set is associated with a virtual machine to process the intermediate data set. A virtual machine configuration is determined based on the size of the intermediate data set. A second resource device is selected to execute the virtual machine based on the virtual machine configuration and on an available bandwidth between the first and second resource devices. The virtual machine is then assigned to the second resource device to process the intermediate data set.
18 Citations
16 Claims
-
1. A method comprising:
-
receiving, at a computing device, data indicative of the size of an intermediate data set generated by a first resource device; associating the intermediate data set with a virtual machine to process the intermediate data set; determining a virtual machine configuration based on the size of the intermediate data set; selecting a second resource device to execute the virtual machine based on the virtual machine configuration and on an available bandwidth between the first and second resource devices; and assigning the virtual machine to the second resource device to process the intermediate data set, wherein the intermediate data set is generated by a mapper task executed within a virtual machine on the first resource device and the virtual machine assigned to the second resource device executes a reducer task to process the intermediate data set. - View Dependent Claims (2, 3, 4, 5, 6, 7, 8)
-
-
9. An apparatus comprising:
-
one or more network interfaces configured to communicate in a computer network; a processor configured to execute one or more processes; and a memory configured to store a process executable by the processor, the process when executed operable to; receive data indicative of the size of an intermediate data set generated by a first resource device; associate the intermediate data set with a virtual machine to process the intermediate data set; determine a virtual machine configuration for processing a next computational phase of the intermediate data set based on the size of the intermediate data set; select a second resource device to execute the virtual machine based on the virtual machine configuration and on an available bandwidth between the first and second resource devices; and assign the virtual machine to the second resource device to process the intermediate data set, wherein the intermediate data set is generated by a mapper task executed within a virtual machine on the first resource device and the virtual machine assigned to the second resource device executes a reducer task to process the intermediate data set. - View Dependent Claims (10, 11, 12, 13, 14, 15)
-
-
16. A tangible, non-transitory, computer-readable media medium having software encoded thereon, the software, when executed by a processor, operable to:
-
receive data indicative of the size of an intermediate data set generated by a first resource device; associate the intermediate data set with a virtual machine to process the intermediate data set; determine a virtual machine configuration for processing a next computational phase of the intermediate data set based on the size of the intermediate data set; select a second resource device to execute the virtual machine based on the virtual machine configuration and on an available bandwidth between the first and second resource devices; and assign the virtual machine to the second resource device to process the intermediate data set wherein the intermediate data set is generated by a mapper task executed within a virtual machine on the first resource device and the virtual machine assigned to the second resource device executes a reducer task to process the intermediate data set.
-
Specification