Booting virtual machine instances in a distributed data processing architecture
First Claim
1. A method comprising:
- storing a plurality of images in a distributed file system, whereinthe distributed file system comprises a plurality of storage units,each of the plurality of images is an image of one of a plurality of virtual machines (VMs),a target data set is associated with a first VM of the plurality of VMs,the target data set comprises one or more redundant copies of data associated with the first VM, andthe target data set is distributed across a plurality of the plurality of storage units;
allocating one or more computing resources, from an available pool of computing resources, to the first VM;
identifying at least two storage units in which the target data set is stored, whereinthe at least two storage units are identified from among the plurality of storage units, andeach of the at least two storage units stores a portion of the target data set but not all of the target data set;
selecting a corresponding host that has an acceptable level of physical proximity to the at least two storage units, whereinthe corresponding host is selected from a plurality of hosts, andthe acceptable level of physical proximity is based, at least in part, on one or more requirements of one or more applications that are to be executed on the first VM;
assigning the first VM to the corresponding host;
booting the first VM on the corresponding host, whereinbooting the first VM comprises loading the one or more applications on the first VM; and
executing the one or more applications on the first VM, whereinthe executing the one or more applications comprises processing the target data set by accessing both of the at least two storage units in which the target data set is stored.
7 Assignments
0 Petitions
Accused Products
Abstract
VMs are booted in a big data framework within the context of a cluster of computing and storage devices. The big data framework comprises a distributed, location aware file system and a cluster resource manager that assigns computing resources. VM images are stored as data in the distributed file system. Computing resources and hosts are allocated to specific VMs. The allocated hosts are within given levels of proximity to target data. VMs are booted and run on the hosts, and applications are run on the VMs, processing target data in the distributed file system. Prior to booting a given VM, a stored image can be converted between formats. This enables dynamically determining the VM format at boot time, based on what is supported by the hypervisor available on the target host.
26 Citations
22 Claims
-
1. A method comprising:
-
storing a plurality of images in a distributed file system, wherein the distributed file system comprises a plurality of storage units, each of the plurality of images is an image of one of a plurality of virtual machines (VMs), a target data set is associated with a first VM of the plurality of VMs, the target data set comprises one or more redundant copies of data associated with the first VM, and the target data set is distributed across a plurality of the plurality of storage units; allocating one or more computing resources, from an available pool of computing resources, to the first VM; identifying at least two storage units in which the target data set is stored, wherein the at least two storage units are identified from among the plurality of storage units, and each of the at least two storage units stores a portion of the target data set but not all of the target data set; selecting a corresponding host that has an acceptable level of physical proximity to the at least two storage units, wherein the corresponding host is selected from a plurality of hosts, and the acceptable level of physical proximity is based, at least in part, on one or more requirements of one or more applications that are to be executed on the first VM; assigning the first VM to the corresponding host; booting the first VM on the corresponding host, wherein booting the first VM comprises loading the one or more applications on the first VM; and executing the one or more applications on the first VM, wherein the executing the one or more applications comprises processing the target data set by accessing both of the at least two storage units in which the target data set is stored. - View Dependent Claims (2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12)
-
-
13. A non-transitory computer-readable storage medium storing program instructions executable to:
-
store a plurality of images in a distributed file system, wherein the distributed file system comprises a plurality of storage units, each of the plurality of images is an image of one of a plurality of virtual machines (VMs), a target data set is associated with a first VM of the plurality of VMs, the target data set comprises one or more redundant copies of data associated with the first VM, and the target data set is distributed across a plurality of the plurality of storage units; allocate one or more computing resources, from an available pool of computing resources, to the first VM; identify at least two storage units in which the target data set is stored, wherein the at least two storage units are identified from among the plurality of storage units, and each of the at least two storage units stores a portion of the target data set but not all of the target data set; select a corresponding host that has an acceptable level of physical proximity to the at least two storage units, wherein the corresponding host is selected from a plurality of hosts, and the acceptable level of physical proximity is based, at least in part, on one or more requirements of one or more applications that are to be executed on the first VM; assign the first VM to the corresponding host; boot the first VM on the corresponding host, wherein booting the first VM comprises loading the one or more applications on the first VM; and execute the one or more applications on the first VM, wherein executing the one or more applications comprises processing the target data set by accessing both of the at least two storage units in which the target data set is stored. - View Dependent Claims (14, 15, 16, 17, 18, 19, 20, 21)
-
-
22. A system comprising:
-
one or more processors; and a memory coupled to the one or more processors, wherein the memory stores program instructions executable by the one or more processors to; store a plurality of images in a distributed file system, wherein the distributed file system comprises a plurality of storage units, each of the plurality of images is an image of one of a plurality of virtual machines (VMs), a target data set is associated with a first VM of the plurality of VMs, the target data set comprises one or more redundant copies of data associated with the first VM, and the target data set is distributed across a plurality of the plurality of storage units; allocate one or more computing resources, from an available pool of computing resources, to the first VM; identify at least two storage units in which the target data set is stored, wherein the at least two storage units are identified from among the plurality of storage units, and each of the at least two storage units stores a portion of the target data set but not all of the target data set; select a corresponding host that has an acceptable level of physical proximity to the at least two storage units, wherein the corresponding host is selected from a plurality of hosts, and the acceptable level of physical proximity is based, at least in part, on one or more requirements of one or more applications that are to be executed on the first VM; assign the first VM to the corresponding host; boot the first VM on the corresponding host, wherein booting the first VM comprises loading the one or more applications on the first VM; and execute the one or more applications on the first VM, wherein executing the one or more applications comprises processing the target data set by accessing both of the at least two storage units in which the target data set is stored.
-
Specification