×

Provisioning a cluster of distributed computing platform based on placement strategy

  • US 9,268,590 B2
  • Filed: 02/29/2012
  • Issued: 02/23/2016
  • Est. Priority Date: 02/29/2012
  • Status: Active Grant
First Claim
Patent Images

1. A method for provisioning a cluster for a distributed computing platform, the method comprising:

  • receiving configuration information for the cluster, wherein the configuration information comprises at least a cluster size, a data set, and code for processing the data set;

    selecting a plurality of target host computing devices from a plurality of host computing devices based on the configuration information;

    instantiating, based on the cluster size, at least one virtual machine (VM) on each of the target host computing devices to serve as a node of the cluster, wherein each instantiated VM is configured to access a virtual disk that is based on a VM template in a set of VM templates and the at least one VM is preconfigured with distributed software computing code for executing functionality of the distributed computing platform based on the respective VM template;

    receiving physical location information for a plurality of racks in which each instantiated VM is located, wherein a rack includes multiple host computing devices of the plurality of target host computing devices;

    providing the physical location information to an instantiated VM, wherein the instantiated VM uses the physical location information to determine where to store the data set in a distributed file system accessible by at least a subset of the VMs based on a placement strategy for processing of the data set, wherein the placement strategy can be a placement strategy for operational robustness or operational efficiency or a combination of operational robustness and operational efficiency, wherein, given a placement strategy for operational robustness, data for the data set is stored in different racks using the physical location information, wherein, given a placement strategy for operational efficiency, replica data of the data for the data set is stored in a same location as the data for the data set, and wherein, given a placement strategy for a combination of operational robustness and operational efficiency, a first replica of the data for the data set is placed at a different location from that of original data for the data set and a second replica of the data for the data set is placed at the same location as that of the data for the data set and the distributed file system is accessed by the distributed computing platform during processing of the data set;

    providing the code for processing the data set to at least a subset of the VMs; and

    initiating execution of the code for processing the data set on the at least subset of VMs to obtain data processing results, wherein the at least subset of VMs use the distributed software computing code to execute the code for processing the data set.

View all claims
  • 2 Assignments
Timeline View
Assignment View
    ×
    ×