Network optimized deduplication of virtual machine snapshots

US 10,282,112 B2
Filed: 02/20/2015
Issued: 05/07/2019
Est. Priority Date: 11/04/2014
Status: Active Grant

First Claim

Patent Images

1. A method for operating a data management system, comprising:

acquiring a first snapshot of a first virtual machine;

storing the first snapshot using a first storage device of a first type;

generating a first plurality of hash values corresponding to a first signature by sampling a first plurality of data blocks within the first snapshot, the first plurality of data blocks includes a first region of data blocks wherein at least two or more data blocks are spaced at a fixed distance from each other and a second region of data blocks wherein at least two or more data blocks are spaced at increasingly greater distances from each other, the second region does not overlap with the first region;

acquiring a second snapshot of a second virtual machine subsequent to acquiring the first snapshot of the first virtual machine, the first virtual machine and the second virtual machine comprise different virtual machines;

generating a second plurality of hash values corresponding to a second signature by sampling a second plurality of data blocks within the second snapshot;

determining a matching score between the first signature and the second signature by comparing the first plurality of hash values to the second plurality of hash values;

generating a dependent base file for the second virtual machine based on the matching score, wherein the dependent base file comprises data differences between the first snapshot of the first virtual machine and the second snapshot of the second virtual machine;

storing the dependent base file using a second storage device of a second type different from the first type;

andgenerating a third snapshot of the second virtual machine using the dependent base file for the second virtual machine and the first snapshot of the first virtual machine, wherein the generating the third snapshot includes concurrently reading the dependent base file for the second virtual machine from the second storage device of the second type while reading the first snapshot of the first virtual machine from the first storage device of the first type.

View all claims

2 Assignments

Timeline View

Assignment View

0 Petitions

Accused Products

Abstract

Methods and systems for managing, storing, and serving data within a virtualized environment are described. In some embodiments, a data management system may manage the extraction and storage of virtual machine snapshots, provide near instantaneous restoration of a virtual machine or one or more files located on the virtual machine, and enable secondary workloads to directly use the data management system as a primary storage target to read or modify past versions of data. The data management system may allow a virtual machine snapshot of a virtual machine stored within the system to be directly mounted to enable substantially instantaneous virtual machine recovery of the virtual machine.

Citations

20 Claims

1. A method for operating a data management system, comprising:
- acquiring a first snapshot of a first virtual machine;
  
  storing the first snapshot using a first storage device of a first type;
  
  generating a first plurality of hash values corresponding to a first signature by sampling a first plurality of data blocks within the first snapshot, the first plurality of data blocks includes a first region of data blocks wherein at least two or more data blocks are spaced at a fixed distance from each other and a second region of data blocks wherein at least two or more data blocks are spaced at increasingly greater distances from each other, the second region does not overlap with the first region;
  
  acquiring a second snapshot of a second virtual machine subsequent to acquiring the first snapshot of the first virtual machine, the first virtual machine and the second virtual machine comprise different virtual machines;
  
  generating a second plurality of hash values corresponding to a second signature by sampling a second plurality of data blocks within the second snapshot;
  
  determining a matching score between the first signature and the second signature by comparing the first plurality of hash values to the second plurality of hash values;
  
  generating a dependent base file for the second virtual machine based on the matching score, wherein the dependent base file comprises data differences between the first snapshot of the first virtual machine and the second snapshot of the second virtual machine;
  
  storing the dependent base file using a second storage device of a second type different from the first type;
  
  andgenerating a third snapshot of the second virtual machine using the dependent base file for the second virtual machine and the first snapshot of the first virtual machine, wherein the generating the third snapshot includes concurrently reading the dependent base file for the second virtual machine from the second storage device of the second type while reading the first snapshot of the first virtual machine from the first storage device of the first type.
- View Dependent Claims (2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13)
- - 2. The method of claim 1, further comprising:
    - acquiring one or more incremental files corresponding with one or more snapshots of the second virtual machine subsequent to acquiring the second snapshot; and
      
      storing the one or more incremental files using the second storage device of the second type different from the first storage device of the first type, the generating the third snapshot of the second virtual machine includes reading the dependent base file and the one or more incremental files from the second storage device of the second type while reading the first snapshot of the first virtual machine from the first storage device of the first type.
  - 3. The method of claim 1, wherein:
    - the second virtual machine is not a clone of the first virtual machine.
  - 4. The method of claim 1, wherein:
    - the first storage device of the first type comprises a magnetic storage device; and
      
      the second storage device of the second type comprises a solid-state storage device.
  - 5. The method of claim 4, wherein:
    - the magnetic storage device comprises a hard disk drive; and
      
      the solid-state storage device comprises a flash-based memory.
  - 6. The method of claim 1, wherein:
    - the first storage device is located within a first physical machine and the second storage device is located within the first physical machine.
  - 7. The method of claim 1, wherein:
    - the first snapshot comprises a first full image snapshot of the first virtual machine;
      
      the second snapshot comprises a second full image snapshot of the second virtual machine;
      
      the first virtual machine corresponds with a first set of virtual machine configuration settings; and
      
      the second virtual machine corresponds with a second set of virtual machine configuration settings different from the first set of virtual machine configuration settings.
  - 8. The method of claim 1, further comprising:
    - receiving an instruction to output a file associated with the second snapshot of the second virtual machine;
      
      generating the file using the dependent base file and the first snapshot of the first virtual machine in response to receiving the instruction; and
      
      outputting at least a portion of the file.
  - 9. The method of claim 8, wherein:
    - the generating the file includes acquiring the dependent base file from the second storage device while acquiring the first snapshot from the first storage device.
  - 10. The method of claim 8, wherein:
    - the generating the file includes acquiring the dependent base file from the second storage device during a first period of time and acquiring at least a portion of the first snapshot from the first storage device during the first period of time.
  - 11. The method of claim 8, wherein:
    - the generating the file includes reading the dependent base file from the second storage device in parallel with reading the first snapshot from the first storage device.
  - 12. The method of claim 8, wherein:
    - the generating the file includes generating the file associated with the second snapshot of the second virtual machine by patching the dependent base file to the first snapshot of the first virtual machine.
  - 13. The method of claim 1, further comprising:
    - generating a merged file corresponding with the second snapshot of the second virtual machine, the merged file includes a first pointer to the first snapshot and a second pointer to the dependent base file; and
      
      storing the merged file using a distributed metadata store.

14. A data management system, comprising:
- a first storage device configured to store a first snapshot of a first virtual machine;
  
  a second storage device; and
  
  one or more processors in communication with the first storage device and the second storage device, the one or more processors configured to acquire a second snapshot of a second virtual machine different from the first virtual machine, the one or more processors configured to generate a first plurality of hash values corresponding with a sampling of a first plurality of data blocks within the first snapshot and generate a second plurality of hash values corresponding with a sampling of a second plurality of data blocks within the second snapshot, the first plurality of data blocks within the first snapshot is arranged such that two or more data blocks of a first set of the first plurality of data blocks are spaced at a fixed distance from each other while two or more data blocks of a second set of the first plurality of data blocks are spaced at increasingly greater distances from each other, the first set of the first plurality of data blocks corresponds with a first region within the first snapshot and the second set of the first plurality of data blocks corresponds with a second region within the first snapshot that does not overlap with the first region, the one or more processors configured to compare the first plurality of hash values with the second plurality of hash values and determine a matching score based on a number of matched hashes between the first plurality of hash values and the second plurality of hash values based on the comparison, the one or more processors configured to generate a dependent base file for the second virtual machine based on the matching score, the dependent base file comprises data differences between the first snapshot of the first virtual machine and the second snapshot of the second virtual machine, the one or more processors configured to store the dependent base file using a second storage device different from the first storage device and generate a third snapshot of the second virtual machine, the one or more processors configured to concurrently read the dependent base file from the second storage device and the first snapshot of the first virtual machine from the first storage device, the one or more processors configured to generate the third snapshot of the second virtual machine using the dependent base file for the second virtual machine and the first snapshot of the first virtual machine.
- View Dependent Claims (15, 16, 17, 18, 19)
- - 15. The data management system of claim 14, wherein:
    - the first storage device comprises a magnetic storage device; and
      
      the second storage device comprises a solid-state storage device.
  - 16. The data management system of claim 15, wherein:
    - the magnetic storage device comprises a hard disk drive; and
      
      the solid-state storage device comprises a flash-based memory.
  - 17. The data management system of claim 14, wherein:
    - the one or more processors configured to receive an instruction to output a file associated with the second snapshot of the second virtual machine, the one or more processors configured to generate the file using the dependent base file and the first snapshot of the first virtual machine in response to reception of the instruction, the one or more processors configured to cause at least a portion of the file to be outputted from the data management system.
  - 18. The data management system of claim 17, wherein:
    - the one or more processors configured to acquire the dependent base file from the second storage device while the first snapshot is acquired from the first storage device, the one or more processors configured to patch the dependent base file to the first snapshot of the first virtual machine to generate the file associated with the second snapshot of the second virtual machine.
  - 19. The data management system of claim 17, wherein:
    - the one or more processors configured to read the first snapshot from the first storage device in parallel with the dependent base file from the second storage device, the one or more processors configured to combine the dependent base file with the first snapshot of the first virtual machine to generate the file associated with the second snapshot of the second virtual machine.

20. One or more storage devices containing processor readable code for programming one or more processors to perform a method for operating a data management system, the processor readable code comprising:
- processor readable code configured to store a first snapshot of a first virtual machine using a first storage device of a first type;
  
  processor readable code configured to acquire a second snapshot of a second virtual machine different from the first virtual machine;
  
  processor readable code configured to generate a first plurality of hash values corresponding to a first signature by sampling a first plurality of data blocks within the first snapshot, the first plurality of data blocks comprises a first region of data blocks wherein at least two or more data blocks are spaced at a fixed distance from each other and a second region of data blocks wherein at least two or more data blocks are spaced at increasingly greater distances from each other, the first region does not overlap with the second region;
  
  processor readable code configured to generate a second plurality of hash values corresponding to a second signature by sampling a second plurality of data blocks within the second snapshot;
  
  processor readable code configured to compare the first plurality of hash values with the second plurality of hash values and determine a matching score based on a number of matched hashes between the first plurality of hash values and the second plurality of hash values;
  
  processor readable code configured to generate a dependent base file for the second virtual machine based on the matching score, the dependent base file comprises data differences between the first snapshot of the first virtual machine and the second snapshot of the second virtual machine;
  
  processor readable code configured to store the dependent base file using a second storage device of a second type different from the first storage device of the first type; and
  
  processor readable code configured to concurrently read the dependent base file from the second storage device of the second type and the first snapshot of the first virtual machine from the first storage device of the first type and generate a third snapshot of the second virtual machine using the dependent base file for the second virtual machine and the first snapshot of the first virtual machine.

Specification

Resources

Litigation Campaign Assessment

Current Assignee
Rubrik, Inc.
Original Assignee
Rubrik, Inc.
Inventors
Jain, Arvind, Botelho, Fabiano, Nithrakashyap, Arvind
Primary Examiner(s)
Jalil, Neveen Abel
Assistant Examiner(s)
Baker, Irene

Application Number

US14/628,024
Publication Number

US 20160125058A1
Time in Patent Office

1,537 Days
Field of Search

None
US Class Current
CPC Class Codes

G06F 11/1435   using file system or storag...

G06F 11/1446   Point-in-time backing up or...

G06F 11/1448   Management of the data invo...

G06F 11/1451   by selection of backup cont...

G06F 11/1453   using de-duplication of the...

G06F 11/1458   Management of the backup or...

G06F 11/1461   Backup scheduling policy

G06F 11/1464   for networked environments

G06F 11/1484   involving virtual machines

G06F 11/202   where processing functional...

G06F 16/113   Details of archiving lifecy...

G06F 16/128   Details of file system snap...

G06F 16/13   File access structures, e.g...

G06F 16/148   File search processing

G06F 16/27   Replication, distribution o...

G06F 16/84   Mapping; Conversion

G06F 2009/45562   Creating, deleting, cloning...

G06F 2009/4557   Distribution of virtual mac...

G06F 2009/45579   I/O management, e.g. provid...

G06F 2009/45583   Memory management, e.g. acc...

G06F 2201/80 : Database-specific techniques

G06F 2201/815 : Virtual

G06F 2201/84 : Using snapshots, i.e. a log...

G06F 3/0619 : in relation to data integri...

G06F 3/0641 : De-duplication techniques

G06F 3/065 : Replication mechanisms

G06F 3/0665 : at area level, e.g. provisi...

G06F 3/067 : Distributed or networked st...

G06F 3/0685 : Hybrid storage combining he...

G06F 9/45558 : Hypervisor-specific managem...

G06F 9/5077 : Logical partitioning of res...

H04L 43/0817 : by checking functioning

H04L 61/5007 : Internet protocol [IP] addr...

H04L 61/5061 : Pools of addresses

H04L 67/10 : in which an application is ...

H04L 9/3242 : involving keyed hash functi...

H04L 9/3247 : involving digital signatures

View All

Network optimized deduplication of virtual machine snapshots

First Claim

2 Assignments

0 Petitions

Accused Products

Abstract

Citations

20 Claims

Specification

Solutions

Use Cases

Quick Links

Network optimized deduplication of virtual machine snapshots

First Claim

2 Assignments

Subscription Required

Subscription Required

0 Petitions

Subscription Required

Accused Products

Subscription Required

Abstract

Citations

20 Claims

Specification

Subscription Required

Solutions

Use Cases

Quick Links