Deduplication of virtual machine content
First Claim
1. A method for operating a data management system, comprising:
- acquiring a first snapshot of a first virtual machine, the first snapshot comprises a full image of the first virtual machine;
generating a signature for the first virtual machine using the full image, the generating a signature includes generating a plurality of hash values corresponding with a plurality of data blocks within the full image, the plurality of data blocks is arranged such that data blocks of a first plurality of the plurality of data blocks are spaced at a fixed distance from each other and data blocks of a second plurality of the plurality of data blocks are spaced at monotonically increasing distances from each other;
identifying a second virtual machine based on the signature, the second virtual machine is associated with a base image;
generating a dependent base file associated with the first snapshot using the full image and the base image, the dependent base file comprises data differences between the first snapshot of the first virtual machine and a second snapshot of the second virtual machine corresponding with the base image, the first plurality corresponds with a first data region within the full image of the first virtual machine, the second plurality corresponds with a second data region within the full image of the first virtual machine that does not overlap with the first data region; and
storing the dependent base file.
2 Assignments
0 Petitions
Accused Products
Abstract
Methods and systems for managing, storing, and serving data within a virtualized environment are described. In some embodiments, a data management system may manage the extraction and storage of virtual machine snapshots, provide near instantaneous restoration of a virtual machine or one or more files located on the virtual machine, and enable secondary workloads to directly use the data management system as a primary storage target to read or modify past versions of data. The data management system may allow a virtual machine snapshot of a virtual machine stored within the system to be directly mounted to enable substantially instantaneous virtual machine recovery of the virtual machine.
50 Citations
20 Claims
-
1. A method for operating a data management system, comprising:
-
acquiring a first snapshot of a first virtual machine, the first snapshot comprises a full image of the first virtual machine; generating a signature for the first virtual machine using the full image, the generating a signature includes generating a plurality of hash values corresponding with a plurality of data blocks within the full image, the plurality of data blocks is arranged such that data blocks of a first plurality of the plurality of data blocks are spaced at a fixed distance from each other and data blocks of a second plurality of the plurality of data blocks are spaced at monotonically increasing distances from each other; identifying a second virtual machine based on the signature, the second virtual machine is associated with a base image; generating a dependent base file associated with the first snapshot using the full image and the base image, the dependent base file comprises data differences between the first snapshot of the first virtual machine and a second snapshot of the second virtual machine corresponding with the base image, the first plurality corresponds with a first data region within the full image of the first virtual machine, the second plurality corresponds with a second data region within the full image of the first virtual machine that does not overlap with the first data region; and storing the dependent base file. - View Dependent Claims (2, 3, 4, 5, 6, 7, 8, 9, 10, 11)
-
-
12. A data management system, comprising:
-
a memory configured to store a first snapshot of a first virtual machine, the first snapshot comprises a full image of the first virtual machine; and one or more processors configured to generate a signature for the first virtual machine using the full image, the signature includes a plurality of hash values corresponding with a plurality of data blocks within the full image, the plurality of data blocks is arranged such that data blocks of a first plurality of the plurality of data blocks are spaced at a fixed distance from each other and data blocks of a second plurality of the plurality of data blocks are spaced at monotonically increasing distances from each other, the one or more processors configured to identify a second virtual machine different from the first virtual machine based on the signature, the second virtual machine is associated with a base image, the one or more processors configured to generate a dependent base file associated with the first snapshot using the full image and the base image and cause the dependent base file to be stored, the first plurality corresponds with a first data region within the full image of the first virtual machine and the second plurality corresponds with a second data region within the full image of the first virtual machine that does not overlap with the first data region. - View Dependent Claims (13, 14, 15, 16, 17, 18, 19)
-
-
20. One or more storage devices containing processor readable code for programming one or more processors to perform a method for operating a data management system, the processor readable code comprising:
-
processor readable code configured to acquire a first snapshot of a first virtual machine, the first snapshot comprises a full image of the first virtual machine; processor readable code configured to generate a signature for the first virtual machine using the full image, the signature comprises an ordered list of a plurality of hash values, the plurality of hash values derive from a plurality of noncontiguous data blocks within the full image, the plurality of noncontiguous data blocks is arranged such that data blocks of a first plurality of the plurality of noncontiguous data blocks are spaced at a fixed distance from each other and data blocks of a second plurality of the plurality of noncontiguous data blocks are spaced at monotonically increasing distances from each other, the first plurality corresponds with a first data region within the full image of the first virtual machine and the second plurality corresponds with a second data region within the full image of the first virtual machine that does not overlap with the first data region; processor readable code configured to identify a second virtual machine different from the first virtual machine based on the signature, the second virtual machine is associated with a base image; processor readable code configured to generate a dependent base file associated with the first snapshot using the full image and the base image, the dependent base file comprises data differences between the first snapshot of the first virtual machine and a second snapshot of the second virtual machine corresponding with the base image; and processor readable code configured to output the dependent base file from the data management system.
-
Specification