Network optimized deduplication of virtual machine snapshots
First Claim
1. A method for operating a data management system, comprising:
- acquiring a first snapshot of a first virtual machine;
storing the first snapshot using a first storage device of a first type;
generating a first plurality of hash values corresponding to a first signature by sampling a first plurality of data blocks within the first snapshot, the first plurality of data blocks includes a first region of data blocks wherein at least two or more data blocks are spaced at a fixed distance from each other and a second region of data blocks wherein at least two or more data blocks are spaced at increasingly greater distances from each other, the second region does not overlap with the first region;
acquiring a second snapshot of a second virtual machine subsequent to acquiring the first snapshot of the first virtual machine, the first virtual machine and the second virtual machine comprise different virtual machines;
generating a second plurality of hash values corresponding to a second signature by sampling a second plurality of data blocks within the second snapshot;
determining a matching score between the first signature and the second signature by comparing the first plurality of hash values to the second plurality of hash values;
generating a dependent base file for the second virtual machine based on the matching score, wherein the dependent base file comprises data differences between the first snapshot of the first virtual machine and the second snapshot of the second virtual machine;
storing the dependent base file using a second storage device of a second type different from the first type;
andgenerating a third snapshot of the second virtual machine using the dependent base file for the second virtual machine and the first snapshot of the first virtual machine, wherein the generating the third snapshot includes concurrently reading the dependent base file for the second virtual machine from the second storage device of the second type while reading the first snapshot of the first virtual machine from the first storage device of the first type.
2 Assignments
0 Petitions
Accused Products
Abstract
Methods and systems for managing, storing, and serving data within a virtualized environment are described. In some embodiments, a data management system may manage the extraction and storage of virtual machine snapshots, provide near instantaneous restoration of a virtual machine or one or more files located on the virtual machine, and enable secondary workloads to directly use the data management system as a primary storage target to read or modify past versions of data. The data management system may allow a virtual machine snapshot of a virtual machine stored within the system to be directly mounted to enable substantially instantaneous virtual machine recovery of the virtual machine.
-
Citations
20 Claims
-
1. A method for operating a data management system, comprising:
-
acquiring a first snapshot of a first virtual machine; storing the first snapshot using a first storage device of a first type; generating a first plurality of hash values corresponding to a first signature by sampling a first plurality of data blocks within the first snapshot, the first plurality of data blocks includes a first region of data blocks wherein at least two or more data blocks are spaced at a fixed distance from each other and a second region of data blocks wherein at least two or more data blocks are spaced at increasingly greater distances from each other, the second region does not overlap with the first region; acquiring a second snapshot of a second virtual machine subsequent to acquiring the first snapshot of the first virtual machine, the first virtual machine and the second virtual machine comprise different virtual machines; generating a second plurality of hash values corresponding to a second signature by sampling a second plurality of data blocks within the second snapshot; determining a matching score between the first signature and the second signature by comparing the first plurality of hash values to the second plurality of hash values; generating a dependent base file for the second virtual machine based on the matching score, wherein the dependent base file comprises data differences between the first snapshot of the first virtual machine and the second snapshot of the second virtual machine; storing the dependent base file using a second storage device of a second type different from the first type; and generating a third snapshot of the second virtual machine using the dependent base file for the second virtual machine and the first snapshot of the first virtual machine, wherein the generating the third snapshot includes concurrently reading the dependent base file for the second virtual machine from the second storage device of the second type while reading the first snapshot of the first virtual machine from the first storage device of the first type. - View Dependent Claims (2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13)
-
-
14. A data management system, comprising:
-
a first storage device configured to store a first snapshot of a first virtual machine; a second storage device; and one or more processors in communication with the first storage device and the second storage device, the one or more processors configured to acquire a second snapshot of a second virtual machine different from the first virtual machine, the one or more processors configured to generate a first plurality of hash values corresponding with a sampling of a first plurality of data blocks within the first snapshot and generate a second plurality of hash values corresponding with a sampling of a second plurality of data blocks within the second snapshot, the first plurality of data blocks within the first snapshot is arranged such that two or more data blocks of a first set of the first plurality of data blocks are spaced at a fixed distance from each other while two or more data blocks of a second set of the first plurality of data blocks are spaced at increasingly greater distances from each other, the first set of the first plurality of data blocks corresponds with a first region within the first snapshot and the second set of the first plurality of data blocks corresponds with a second region within the first snapshot that does not overlap with the first region, the one or more processors configured to compare the first plurality of hash values with the second plurality of hash values and determine a matching score based on a number of matched hashes between the first plurality of hash values and the second plurality of hash values based on the comparison, the one or more processors configured to generate a dependent base file for the second virtual machine based on the matching score, the dependent base file comprises data differences between the first snapshot of the first virtual machine and the second snapshot of the second virtual machine, the one or more processors configured to store the dependent base file using a second storage device different from the first storage device and generate a third snapshot of the second virtual machine, the one or more processors configured to concurrently read the dependent base file from the second storage device and the first snapshot of the first virtual machine from the first storage device, the one or more processors configured to generate the third snapshot of the second virtual machine using the dependent base file for the second virtual machine and the first snapshot of the first virtual machine. - View Dependent Claims (15, 16, 17, 18, 19)
-
-
20. One or more storage devices containing processor readable code for programming one or more processors to perform a method for operating a data management system, the processor readable code comprising:
-
processor readable code configured to store a first snapshot of a first virtual machine using a first storage device of a first type; processor readable code configured to acquire a second snapshot of a second virtual machine different from the first virtual machine; processor readable code configured to generate a first plurality of hash values corresponding to a first signature by sampling a first plurality of data blocks within the first snapshot, the first plurality of data blocks comprises a first region of data blocks wherein at least two or more data blocks are spaced at a fixed distance from each other and a second region of data blocks wherein at least two or more data blocks are spaced at increasingly greater distances from each other, the first region does not overlap with the second region; processor readable code configured to generate a second plurality of hash values corresponding to a second signature by sampling a second plurality of data blocks within the second snapshot; processor readable code configured to compare the first plurality of hash values with the second plurality of hash values and determine a matching score based on a number of matched hashes between the first plurality of hash values and the second plurality of hash values; processor readable code configured to generate a dependent base file for the second virtual machine based on the matching score, the dependent base file comprises data differences between the first snapshot of the first virtual machine and the second snapshot of the second virtual machine; processor readable code configured to store the dependent base file using a second storage device of a second type different from the first storage device of the first type; and processor readable code configured to concurrently read the dependent base file from the second storage device of the second type and the first snapshot of the first virtual machine from the first storage device of the first type and generate a third snapshot of the second virtual machine using the dependent base file for the second virtual machine and the first snapshot of the first virtual machine.
-
Specification