Provisioning a cluster of distributed computing platform based on placement strategy

US 9,268,590 B2
Filed: 02/29/2012
Issued: 02/23/2016
Est. Priority Date: 02/29/2012
Status: Active Grant

First Claim

Patent Images

1. A method for provisioning a cluster for a distributed computing platform, the method comprising:

receiving configuration information for the cluster, wherein the configuration information comprises at least a cluster size, a data set, and code for processing the data set;

selecting a plurality of target host computing devices from a plurality of host computing devices based on the configuration information;

instantiating, based on the cluster size, at least one virtual machine (VM) on each of the target host computing devices to serve as a node of the cluster, wherein each instantiated VM is configured to access a virtual disk that is based on a VM template in a set of VM templates and the at least one VM is preconfigured with distributed software computing code for executing functionality of the distributed computing platform based on the respective VM template;

receiving physical location information for a plurality of racks in which each instantiated VM is located, wherein a rack includes multiple host computing devices of the plurality of target host computing devices;

providing the physical location information to an instantiated VM, wherein the instantiated VM uses the physical location information to determine where to store the data set in a distributed file system accessible by at least a subset of the VMs based on a placement strategy for processing of the data set, wherein the placement strategy can be a placement strategy for operational robustness or operational efficiency or a combination of operational robustness and operational efficiency, wherein, given a placement strategy for operational robustness, data for the data set is stored in different racks using the physical location information, wherein, given a placement strategy for operational efficiency, replica data of the data for the data set is stored in a same location as the data for the data set, and wherein, given a placement strategy for a combination of operational robustness and operational efficiency, a first replica of the data for the data set is placed at a different location from that of original data for the data set and a second replica of the data for the data set is placed at the same location as that of the data for the data set and the distributed file system is accessed by the distributed computing platform during processing of the data set;

providing the code for processing the data set to at least a subset of the VMs; and

initiating execution of the code for processing the data set on the at least subset of VMs to obtain data processing results, wherein the at least subset of VMs use the distributed software computing code to execute the code for processing the data set.

View all claims

2 Assignments

Timeline View

Assignment View

0 Petitions

Accused Products

Abstract

Embodiments perform automated provisioning of a cluster for a distributed computing platform. Target host computing devices are selected from a plurality of host computing devices based on configuration information, such as a desired cluster size, a data set, code for processing the data set and, optionally, a placement strategy. One or more virtual machines (VMs) are instantiated on each target host computing device. Each VM is configured to access a virtual disk that is preconfigured with code for executing functionality of the distributed computing platform and serves as a node of the cluster. The data set is stored in a distributed file system accessible by at least a subset of the VMs. The code for processing the data set is provided to at least a subset of the VMs, and execution of the code is initiated to obtain processing results.

Citations

21 Claims

1. A method for provisioning a cluster for a distributed computing platform, the method comprising:
- receiving configuration information for the cluster, wherein the configuration information comprises at least a cluster size, a data set, and code for processing the data set;
  
  selecting a plurality of target host computing devices from a plurality of host computing devices based on the configuration information;
  
  instantiating, based on the cluster size, at least one virtual machine (VM) on each of the target host computing devices to serve as a node of the cluster, wherein each instantiated VM is configured to access a virtual disk that is based on a VM template in a set of VM templates and the at least one VM is preconfigured with distributed software computing code for executing functionality of the distributed computing platform based on the respective VM template;
  
  receiving physical location information for a plurality of racks in which each instantiated VM is located, wherein a rack includes multiple host computing devices of the plurality of target host computing devices;
  
  providing the physical location information to an instantiated VM, wherein the instantiated VM uses the physical location information to determine where to store the data set in a distributed file system accessible by at least a subset of the VMs based on a placement strategy for processing of the data set, wherein the placement strategy can be a placement strategy for operational robustness or operational efficiency or a combination of operational robustness and operational efficiency, wherein, given a placement strategy for operational robustness, data for the data set is stored in different racks using the physical location information, wherein, given a placement strategy for operational efficiency, replica data of the data for the data set is stored in a same location as the data for the data set, and wherein, given a placement strategy for a combination of operational robustness and operational efficiency, a first replica of the data for the data set is placed at a different location from that of original data for the data set and a second replica of the data for the data set is placed at the same location as that of the data for the data set and the distributed file system is accessed by the distributed computing platform during processing of the data set;
  
  providing the code for processing the data set to at least a subset of the VMs; and
  
  initiating execution of the code for processing the data set on the at least subset of VMs to obtain data processing results, wherein the at least subset of VMs use the distributed software computing code to execute the code for processing the data set.
- View Dependent Claims (2, 3, 4, 5, 6, 7, 8)
- - 2. The method of claim 1, wherein at least one of the instantiated VMs is configured to utilize local storage in the target host computing device on which the VM has been instantiated, and the local storage supports the distributed file system.
  - 3. The method of claim 2, wherein different portions of the data set is stored in different local storages corresponding to target host computing devices of the instantiated VMs.
  - 4. The method of claim 1, further comprising the step of selecting the VM template that serves as a basis for virtual disks based on the configuration information.
  - 5. The method of claim 4, wherein at least one virtual disk of an instantiated VM is a linked clone of the VM template.
  - 6. The method of claim 1, further comprising the step of enabling a user to maintain a life cycle of the cluster for future processing of data.
  - 7. The method of claim 1, wherein the step of selecting the plurality of target host computing devices is based on the placement strategy and a location of each of the target host computing devices, wherein the location includes one or more of the following:
    - a physical location and a network location.
  - 8. The method of claim 1, further comprising:
    - generating a script for use by each instantiated VM to identify the physical location information for each instantiated VM; and
      
      providing the script to each instantiated VM, wherein each instantiated VM uses the script to provide respective location information for each instantiated VM.

9. One or more non-transitory computer-readable storage media including computer-executable instructions that, when executed by a computer processor, cause the computer processor to provision a cluster of a distributed computing platform having a plurality of virtual machines (VMs) by:
- receiving configuration information for the cluster, wherein the configuration information comprises at least a cluster size, a data set, and code for processing the data set;
  
  selecting a plurality of target host computing devices from a plurality of host computing devices based on the configuration information;
  
  instantiating, based on the cluster size, at least one VM on each of the target host computing devices to serve as a node of the cluster, wherein each instantiated VM is configured to access a virtual disk that is based on a VM template in a set of VM templates and the at least one VM is preconfigured with distributed software computing code for executing functionality of the distributed computing platform based on the respective VM templatereceiving physical location information for a plurality of racks in which each instantiated VM is located, wherein a rack includes multiple host computing devices of the plurality of target host computing devices;
  
  providing the physical location information to an instantiated VM, wherein the instantiated VM uses the physical location information to determine where to store the data set in a distributed file system accessible by at least a subset of the VMs based on a placement strategy for processing of the data set, wherein the placement strategy can be a placement strategy for operational robustness or operational efficiency or a combination of operational robustness and operational efficiency, wherein, given a placement strategy for operational robustness, data for the data set is stored in different racks using the physical location information, wherein, given a placement strategy for operational efficiency replica data of the data for the data set is stored in a same location as the data for the data set, and wherein, given a placement strategy for a combination of operational robustness and operational efficiency, a first replica of the data for the data set is placed at a different location from that of original data for the data set and a second replica of the data for the data set is placed at the same location as that of the data for the data set and the distributed file system is accessed by the distributed computing platform during processing of the data set;
  
  providing the code for processing the data set to at least a subset of the VMs; and
  
  initiating execution of the code for processing the data set on the at least subset of VMs to obtain data processing results, wherein the at least subset of VMs use the distributed software computing code to execute the code for processing the data set.
- View Dependent Claims (10, 11, 12, 13, 14, 15)
- - 10. The non-transitory computer-readable storage media of claim 9, wherein at least one of the instantiated VMs is configured to utilize a local storage in the target host computing device on which the VM has been instantiated, and the local storage supports the distributed file system.
  - 11. The non-transitory computer-readable storage media of claim 10, wherein the computer-executable instructions cause the computer processor to store different portions of the data set in different local storages corresponding to target host computing devices of the instantiated VMs.
  - 12. The non-transitory computer-readable storage media of claim 9, wherein the computer-executable instructions further cause the computer processor to select the VM template that serves as a basis for virtual disks based on the configuration information.
  - 13. The non-transitory computer-readable storage media of claim 12, wherein at least one virtual disk of an instantiated VM is a linked clone of the VM template.
  - 14. The non-transitory computer-readable storage media of claim 9, wherein the computer-executable instructions further cause the computer processor to enable a user to maintain a life cycle of the cluster for future processing of data.
  - 15. The non-transitory computer-readable storage media of claim 9, wherein the step of selecting the target host computing devices is based on the placement strategy and a location of each of the target host computing devices, wherein the location includes one or more of the following:
    - a physical location and a network location.

16. A system for provisioning a cluster of a distributed computing platform, the system comprising:
- a plurality of host computing devices; and
  
  a management device coupled in communication with the host computing devices and configured to;
  
  receive configuration information for the cluster, wherein the configuration information comprises at least a cluster size, a data set, and code for processing the data set;
  
  select a plurality of target host computing devices from the plurality of host computing devices based on the configuration information;
  
  instantiate, based on the cluster size, at least one virtual machine (VM) on each of the target host computing devices to serve as a node of the cluster, wherein each instantiated VM is configured to access a virtual disk that is based on a VM template in the set of VM templates and the at least one VM is preconfigured with distributed software computing code for executing functionality of the distributed computing platform based on the respective VM template;
  
  receive physical location information for a plurality of racks in which each instantiated VM is located, wherein a rack includes multiple host computing devices of the plurality of target host computing devices;
  
  provide the physical location information to an instantiated VM, wherein the instantiated VM uses the physical location information to determine where to store the data set in a distributed file system accessible by at least a subset of the VMs based on a placement strategy for processing of the data set, wherein the placement strategy can be a placement strategy for operational robustness or operational efficiency or a combination of operational robustness and operational efficiency, wherein, given a placement strategy for operational robustness, data for the data set is stored in different racks using the physical location information, wherein, given a placement strategy for operational efficiency, replica data of the data for the data set is stored in a same location as the data for the data set, and wherein, given a placement strategy for a combination of operational robustness and operational efficiency, a first replica of the data for the data set is placed at a different location from that of original data for the data set and a second replica of the data for the data set is placed at the same location as that of the data for the data set and the distributed file system is accessed by the distributed computing platform during processing of the data set;
  
  provide the code for processing the data set to at least a subset of the VMs; and
  
  initiate execution of the code for processing the data set on the at least subset of VMs to obtain data processing results, wherein the at least subset of VMs use the distributed software computing code to execute the code for processing the data set.
- View Dependent Claims (17, 18, 19, 20, 21)
- - 17. The system of claim 16, wherein at least one of the instantiated VMs is configured to utilize a local storage in the target host computing device on which the VM has been instantiated, and the local storage supports the distributed file system.
  - 18. The system of claim 17, wherein the management device is configured to store different portions of the data set in different local storages corresponding to target host computing devices of the instantiated VMs.
  - 19. The system of claim 16, wherein the management device is further configured to select the VM template that serves as a basis for virtual disks based on the configuration information.
  - 20. The system of claim 19, wherein the management device is further configured to enable a user to maintain a life cycle of the cluster.
  - 21. The system of claim 16, wherein the management device is configured to select the target host computing devices based on the placement strategy and a location of each of the target host computing devices, wherein the location includes one or more of the following:
    - a physical location and a network location.

Specification

Resources

Litigation Campaign Assessment

Current Assignee
Vmware LLC (Broadcom, Inc.)
Original Assignee
VMware, Inc. (Broadcom, Inc.)
Inventors
Du, Junping, He, Ying, Wan, Da, Xiao, Jun
Primary Examiner(s)
Rashid, Wissam

Application Number

US13/407,895
Publication Number

US 20130227558A1
Time in Patent Office

1,455 Days
Field of Search

None
US Class Current

1/1
CPC Class Codes

G06F 9/45558   Hypervisor-specific managem...

G06F 9/5077   Logical partitioning of res...

H04L 43/04   Processing captured monitor...

H04L 67/52   specially adapted for the l...

Provisioning a cluster of distributed computing platform based on placement strategy

First Claim

2 Assignments

0 Petitions

Accused Products

Abstract

Citations

21 Claims

Specification

Solutions

Use Cases

Quick Links

Provisioning a cluster of distributed computing platform based on placement strategy

First Claim

2 Assignments

Subscription Required

Subscription Required

0 Petitions

Subscription Required

Accused Products

Subscription Required

Abstract

Citations

21 Claims

Specification

Subscription Required

Solutions

Use Cases

Quick Links