×

Identification of virtual machines using a distributed job scheduler

  • US 10,007,445 B2
  • Filed: 02/20/2015
  • Issued: 06/26/2018
  • Est. Priority Date: 11/04/2014
  • Status: Active Grant
First Claim
Patent Images

1. A method for operating a data management system, comprising:

  • storing a first set of snapshots of a first virtual machine as a first set of files using a distributed file system, the distributed file system replicates the first set of files among a plurality of nodes within a cluster, the first set of snapshots includes a first base image for the first virtual machine;

    storing a second set of snapshots of a second virtual machine different from the first virtual machine as a second set of files using the distributed file system, the distributed file system replicates the second set of files among the plurality of nodes within the cluster, the second set of snapshots includes a second base image for the second virtual machine;

    determining a first job associated with the first virtual machine to be performed using a distributed job scheduler, the distributed job scheduler comprises a plurality of job scheduling processes running on the plurality of nodes, each node of the plurality of nodes runs one of the plurality of job scheduling processes;

    determining that a first node of the plurality of nodes stores the first set of files; and

    running the first job on the first node in response to determining that the first node stores the first set of files, the first job comprising;

    generating a plurality of hash values corresponding with a plurality of data blocks within the first base image for the first virtual machine, the plurality of data blocks is arranged such that data blocks within a first portion of the first base image are spaced at a fixed distance from each other and other data blocks within a second portion of the first base image are spaced at monotonically increasing distances from each other, the first portion of the first base image does not overlap with the second portion of the first base image;

    comparing the plurality of hash values with another plurality of hash values corresponding with a plurality of other data blocks within the second base image for the second virtual machine different from the first virtual machine;

    identifying the second base image for the second virtual machine as a candidate base image from which a dependent base file for the first virtual machine is generated;

    generating the dependent base file using the first base image for the first virtual machine and the second base image for the second virtual machine; and

    storing the dependent base file for the first virtual machine using the distributed file system.

View all claims
  • 2 Assignments
Timeline View
Assignment View
    ×
    ×