Identification of virtual machines using a distributed job scheduler

US 10,007,445 B2
Filed: 02/20/2015
Issued: 06/26/2018
Est. Priority Date: 11/04/2014
Status: Active Grant

First Claim

Patent Images

1. A method for operating a data management system, comprising:

storing a first set of snapshots of a first virtual machine as a first set of files using a distributed file system, the distributed file system replicates the first set of files among a plurality of nodes within a cluster, the first set of snapshots includes a first base image for the first virtual machine;

storing a second set of snapshots of a second virtual machine different from the first virtual machine as a second set of files using the distributed file system, the distributed file system replicates the second set of files among the plurality of nodes within the cluster, the second set of snapshots includes a second base image for the second virtual machine;

determining a first job associated with the first virtual machine to be performed using a distributed job scheduler, the distributed job scheduler comprises a plurality of job scheduling processes running on the plurality of nodes, each node of the plurality of nodes runs one of the plurality of job scheduling processes;

determining that a first node of the plurality of nodes stores the first set of files; and

running the first job on the first node in response to determining that the first node stores the first set of files, the first job comprising;

generating a plurality of hash values corresponding with a plurality of data blocks within the first base image for the first virtual machine, the plurality of data blocks is arranged such that data blocks within a first portion of the first base image are spaced at a fixed distance from each other and other data blocks within a second portion of the first base image are spaced at monotonically increasing distances from each other, the first portion of the first base image does not overlap with the second portion of the first base image;

comparing the plurality of hash values with another plurality of hash values corresponding with a plurality of other data blocks within the second base image for the second virtual machine different from the first virtual machine;

identifying the second base image for the second virtual machine as a candidate base image from which a dependent base file for the first virtual machine is generated;

generating the dependent base file using the first base image for the first virtual machine and the second base image for the second virtual machine; and

storing the dependent base file for the first virtual machine using the distributed file system.

View all claims

2 Assignments

Timeline View

Assignment View

0 Petitions

Accused Products

Abstract

Methods and systems for managing, storing, and serving data within a virtualized environment are described. In some embodiments, a data management system may manage the extraction and storage of virtual machine snapshots, provide near instantaneous restoration of a virtual machine or one or more files located on the virtual machine, and enable secondary workloads to directly use the data management system as a primary storage target to read or modify past versions of data. The data management system may allow a virtual machine snapshot of a virtual machine stored within the system to be directly mounted to enable substantially instantaneous virtual machine recovery of the virtual machine.

56 Citations

View as Search Results

20 Claims

1. A method for operating a data management system, comprising:
- storing a first set of snapshots of a first virtual machine as a first set of files using a distributed file system, the distributed file system replicates the first set of files among a plurality of nodes within a cluster, the first set of snapshots includes a first base image for the first virtual machine;
  
  storing a second set of snapshots of a second virtual machine different from the first virtual machine as a second set of files using the distributed file system, the distributed file system replicates the second set of files among the plurality of nodes within the cluster, the second set of snapshots includes a second base image for the second virtual machine;
  
  determining a first job associated with the first virtual machine to be performed using a distributed job scheduler, the distributed job scheduler comprises a plurality of job scheduling processes running on the plurality of nodes, each node of the plurality of nodes runs one of the plurality of job scheduling processes;
  
  determining that a first node of the plurality of nodes stores the first set of files; and
  
  running the first job on the first node in response to determining that the first node stores the first set of files, the first job comprising;
  
  generating a plurality of hash values corresponding with a plurality of data blocks within the first base image for the first virtual machine, the plurality of data blocks is arranged such that data blocks within a first portion of the first base image are spaced at a fixed distance from each other and other data blocks within a second portion of the first base image are spaced at monotonically increasing distances from each other, the first portion of the first base image does not overlap with the second portion of the first base image;
  
  comparing the plurality of hash values with another plurality of hash values corresponding with a plurality of other data blocks within the second base image for the second virtual machine different from the first virtual machine;
  
  identifying the second base image for the second virtual machine as a candidate base image from which a dependent base file for the first virtual machine is generated;
  
  generating the dependent base file using the first base image for the first virtual machine and the second base image for the second virtual machine; and
  
  storing the dependent base file for the first virtual machine using the distributed file system.
- View Dependent Claims (2, 3, 4, 5, 6, 7, 8, 9, 10, 11)
- - 2. The method of claim 1, further comprising:
    - determining that the first job has been completely executed subsequent to running the first job on the first node; and
      
      updating a state of the first job that is stored within a distributed metadata store in response to determining that the first job has been completely executed.
  - 3. The method of claim 2, wherein:
    - the first job comprises a series of tasks that are to be performed atomically, the determining that the first job has been completely executed includes detecting that each of the series of tasks has been performed without a failure being detected.
  - 4. The method of claim 2, wherein:
    - the distributed metadata store comprises a distributed database, the distributed database replicates the state of the first job among at least a subset of the plurality of nodes.
  - 5. The method of claim 1, further comprising:
    - determining that the first job has failed to be completely executed within a threshold period of time; and
      
      updating a state of the first job that is stored within a distributed metadata store in response to determining that the first job has failed to be completely executed within the threshold period of time.
  - 6. The method of claim 1, wherein:
    - the first set of files includes a first file that is stored as a plurality of chunks within the distributed file system, the first file comprises a full image-level backup of the first virtual machine.
  - 7. The method of claim 1, further comprising:
    - detecting that the first job has failed to be completely executed within a threshold period of time or that the first job has failed; and
      
      undoing one or more tasks performed by the first job in response to detecting that the first job has failed to be completely executed within the threshold period of time or that the first job has failed.
  - 8. The method of claim 1, further comprising:
    - detecting that the first node has failed while running the first job; and
      
      rolling back one or more tasks performed by the first job in response to detecting that that the first node has failed.
  - 9. The method of claim 1, wherein:
    - the dependent base file comprises data differences between the first base image for the first virtual machine and the second base image for the second virtual machine.
  - 10. The method of claim 1, wherein:
    - each data block within the first portion is separated by a fixed data length; and
      
      each data block within the second portion is separated by an increasing data length.
  - 11. The method of claim 1, wherein:
    - the determining the first job associated with the first virtual machine includes determining a snapshot consolidation frequency for the first virtual machine and determining the first job based on the snapshot consolidation frequency.

12. A data management system, comprising:
- a distributed file system configured to store a first set of snapshots of a first virtual machine as a first set of files, the distributed file system configured to replicate the first set of files among a plurality of nodes within a cluster, the first set of snapshots includes a first base image for the first virtual machine, the distributed file system configured to store a second set of snapshots of a second virtual machine different from the first virtual machine as a second set of files, the distributed file system configured to replicate the second set of files among the plurality of nodes within the cluster, the second set of snapshots includes a second base image for the second virtual machine; and
  
  a distributed job scheduler configured to determine a first job associated with the first virtual machine to be performed, the distributed job scheduler comprises a plurality of job scheduling processes running on the plurality of nodes, each node of the plurality of nodes runs one of the plurality of job scheduling processes, the distributed job scheduler configured to determine that a first node of the plurality of nodes stores the first set of files and configured to run the first job on the first node in response to the determination that the first node stores the first set of files, the first job configured to generate a plurality of hash values corresponding with a plurality of data blocks within the first base image for the first virtual machine, the plurality of data blocks is arranged such that data blocks within a first portion of the first base image are spaced at a fixed distance from each other and other data blocks within a second portion of the first base image are spaced at monotonically increasing distances from each other, the first portion of the first base image does not overlap with the second portion of the first base image, the first job configured to compare the plurality of hash values with another plurality of hash values corresponding with a plurality of other data blocks within the second base image for the second virtual machine different from the virtual machine and configured to identify the second base image for the second virtual machine as a candidate base image from which a dependent base file for the first virtual machine is generated, the first job configured to generate the dependent base file using the first base image for the first virtual machine and the second base image for the second virtual machine, the dependent base file comprises data differences between the first base image for the first virtual machine and the second base image for the second virtual machine.
- View Dependent Claims (13, 14, 15, 16, 17, 18, 19)
- - 13. The data management system of claim 12, wherein:
    - the distributed job scheduler configured to determine that the first job has been completely executed and update a state of the first job that is stored within a distributed metadata store in response to determining that the first job has been completely executed.
  - 14. The data management system of claim 13, wherein:
    - the first job comprises a series of tasks that are to be performed atomically, the distributed job scheduler configured to determine that the first job has been completely executed by detecting that each of the series of tasks has been performed without a failure being detected.
  - 15. The data management system of claim 13, wherein:
    - the distributed metadata store comprises a distributed database, the distributed database configured to replicate the state of the first job among at least a subset of the plurality of nodes.
  - 16. The data management system of claim 12, wherein:
    - the distributed job scheduler configured to determine that the first job has failed to be completely executed within a threshold period of time and update a state of the first job that is stored within a distributed metadata store in response to determining that the first job has failed to be completely executed within the threshold period of time.
  - 17. The data management system of claim 12, wherein:
    - the distributed job scheduler configured to detect that the first job has failed to be completely executed within a threshold period of time and roll back one or more tasks performed by the first job in response to detecting that the first job has failed to be completely executed within the threshold period of time.
  - 18. The data management system of claim 12, wherein:
    - the distributed job scheduler configured to determine a snapshot capture frequency at which snapshots of the first virtual machine are to be captured and determine the first job based on the snapshot capture frequency.
  - 19. The data management system of claim 12, wherein:
    - the distributed job scheduler configured to determine a snapshot consolidation frequency for the first virtual machine and determine the first job based on the snapshot consolidation frequency.

20. One or more storage devices containing processor readable code for programming one or more processors to perform a method for operating a data management system, the processor readable code comprising:
- processor readable code configured to store a first set of snapshots of a first virtual machine as a first set of files using a distributed file system, the distributed file system replicates the first set of files among a plurality of nodes within a cluster, the first set of snapshots includes a first base image for the first virtual machine;
  
  processor readable code configured to store a second set of snapshots of a second virtual machine different from the first virtual machine as a second set of files using the distributed file system, the distributed file system replicates the second set of files among the plurality of nodes within the cluster, the second set of snapshots includes a second base image for the second virtual machine;
  
  processor readable code configured to determine a first job associated with the first virtual machine to be performed using a distributed job scheduler, the distributed job scheduler comprises a plurality of job scheduling processes running on the plurality of nodes, each node of the plurality of nodes runs one of the plurality of job scheduling processes;
  
  processor readable code configured to determine that a first node of the plurality of nodes stores the first set of files; and
  
  processor readable code configured to run the first job on the first node in response to determining that the first node stores the first set of files, the first job generates a plurality of hash values corresponding with a plurality of data blocks within the first base image for the first virtual machine and compares the plurality of hash values with another plurality of hash values corresponding with a plurality of other data blocks within the second base image for the second virtual machine different from the first virtual machine, the plurality of data blocks is arranged such that data blocks within a first portion of the first base image are spaced at a fixed distance from each other and other data blocks within a second portion of the first base image are spaced at monotonically increasing distances from each other, the first portion of the first base image does not overlap with the second portion of the first base image, the first job identifies the second base image for the second virtual machine as a candidate base image from which a dependent base file for the first virtual machine is generated and generates the dependent base file using the first base image for the first virtual machine and the second base image for the second virtual machine, the dependent base file comprises data differences between the first base image for the first virtual machine and the second base image for the second virtual machine.

Specification

Resources

Litigation Campaign Assessment

Current Assignee
Rubrik, Inc.
Original Assignee
Rubrik, Inc.
Inventors
Nithrakashyap, Arvind, Madheswaran, Jayanth, Jain, Arvind, Mazumdar, Soham, Derryberry, Jonathan
Primary Examiner(s)
Syed, Farhan

Application Number

US14/628,041
Publication Number

US 20160124978A1
Time in Patent Office

1,222 Days
Field of Search
US Class Current
CPC Class Codes

G06F 11/1435   using file system or storag...

G06F 11/1446   Point-in-time backing up or...

G06F 11/1448   Management of the data invo...

G06F 11/1451   by selection of backup cont...

G06F 11/1453   using de-duplication of the...

G06F 11/1458   Management of the backup or...

G06F 11/1461   Backup scheduling policy

G06F 11/1464   for networked environments

G06F 11/1484   involving virtual machines

G06F 11/202   where processing functional...

G06F 16/113   Details of archiving lifecy...

G06F 16/128   Details of file system snap...

G06F 16/13   File access structures, e.g...

G06F 16/148   File search processing

G06F 16/27   Replication, distribution o...

G06F 16/84   Mapping; Conversion

G06F 2009/45562   Creating, deleting, cloning...

G06F 2009/4557   Distribution of virtual mac...

G06F 2009/45579   I/O management, e.g. provid...

G06F 2009/45583   Memory management, e.g. acc...

G06F 2201/80 : Database-specific techniques

G06F 2201/815 : Virtual middleware or OS fu...

G06F 2201/84 : Using snapshots, i.e. a log...

G06F 3/0619 : in relation to data integri...

G06F 3/0641 : De-duplication techniques

G06F 3/065 : Replication mechanisms

G06F 3/0665 : at area level, e.g. provisi...

G06F 3/067 : Distributed or networked st...

G06F 3/0685 : Hybrid storage combining he...

G06F 9/45558 : Hypervisor-specific managem...

G06F 9/5077 : Logical partitioning of res...

H04L 43/0817 : by checking functioning

H04L 61/5007 : Internet protocol [IP] addr...

H04L 61/5061 : Pools of addresses

H04L 67/10 : in which an application is ...

H04L 9/3242 : involving keyed hash functi...

H04L 9/3247 : involving digital signatures

View All

Identification of virtual machines using a distributed job scheduler

First Claim

2 Assignments

0 Petitions

Accused Products

Abstract

56 Citations

20 Claims

Specification

Solutions

Use Cases

Quick Links

Identification of virtual machines using a distributed job scheduler

First Claim

2 Assignments

Subscription Required

Subscription Required

0 Petitions

Subscription Required

Accused Products

Subscription Required

Abstract

56 Citations

20 Claims

Specification

Subscription Required

Solutions

Use Cases

Quick Links