File deduplication and scan reduction in a virtualization environment
First Claim
1. A computer implemented method for enabling file deduplication and scan reduction across multiple virtual machines in a virtualization environment of a single host computer, the method comprising the steps of:
- creating a virtual machine template based upon which to create multiple virtual machines on the host computer, wherein the virtual machine template comprises a description of a virtual machine in a known good state having a file system containing at least some files to deduplicate across the multiple virtual machines;
for each specific file in the file system of the virtual machine template to deduplicate across the multiple virtual machines, deduplicating the specific file by;
generating a hash of content of the specific file;
storing the generated hash locally on the virtual machine template in association with the specific file; and
moving the content of the specific file from the virtual machine template to a central file store residing independently of the virtual machine template and the multiple virtual machines;
creating multiple virtual machines by cloning the virtual machine template, wherein each one of the multiple virtual machines cloned from the virtual machine template contains a copy of the file system of the virtual machine template and a copy of the generated hashes of the content of the deduplicated files, the copy of the hashes being stored locally on the specific virtual machine in association with the corresponding deduplicated files;
monitoring file access operations on each one of the multiple virtual machines cloned from the virtual machine template;
on each one of the multiple virtual machines cloned from the virtual machine template, in response to detecting an attempt to access a deduplicated file the content of which is in the central file store and is not present on the specific virtual machine, using a corresponding hash stored locally on the specific virtual machine in association with the specific file to retrieve the content of the specific file from the central file store;
in response to a specific virtual machine updating a deduplicated file the content of which is in the central file store and is not present on the specific virtual machine;
storing the updated file on the specific virtual machine and not in the central file store; and
deleting the hash stored in association with the updated file from the specific virtual machine; and
in response to a specific virtual machine deleting a deduplicated file the content of which is in the central file store and is not present on the specific virtual machine;
deleting the hash stored in association with the specific file from the specific virtual machine and deleting an entry for the specific file from the file system of the specific virtual machine.
2 Assignments
0 Petitions
Accused Products
Abstract
A virtual machine template is created. The template includes a file system containing files to be deduplicated across multiple virtual machines. For each file to deduplicate, a hash of the content is generated and stored in association with the file. The content of the file is moved from the virtual machine template to a file store. The entry for the file in the store is indexed according to the hash. Multiple virtual machines are created by cloning the template, each containing a copy of its file system and the hashes stored locally in association with the corresponding deduplicated files. File access operations are monitored on each one of the multiple virtual machines, and attempts to access deduplicated file are detected. In response, the corresponding locally stored hash is used to retrieve the content of the file from the central file store, and provide it to the virtual machine.
-
Citations
20 Claims
-
1. A computer implemented method for enabling file deduplication and scan reduction across multiple virtual machines in a virtualization environment of a single host computer, the method comprising the steps of:
-
creating a virtual machine template based upon which to create multiple virtual machines on the host computer, wherein the virtual machine template comprises a description of a virtual machine in a known good state having a file system containing at least some files to deduplicate across the multiple virtual machines; for each specific file in the file system of the virtual machine template to deduplicate across the multiple virtual machines, deduplicating the specific file by;
generating a hash of content of the specific file;
storing the generated hash locally on the virtual machine template in association with the specific file; and
moving the content of the specific file from the virtual machine template to a central file store residing independently of the virtual machine template and the multiple virtual machines;creating multiple virtual machines by cloning the virtual machine template, wherein each one of the multiple virtual machines cloned from the virtual machine template contains a copy of the file system of the virtual machine template and a copy of the generated hashes of the content of the deduplicated files, the copy of the hashes being stored locally on the specific virtual machine in association with the corresponding deduplicated files; monitoring file access operations on each one of the multiple virtual machines cloned from the virtual machine template; on each one of the multiple virtual machines cloned from the virtual machine template, in response to detecting an attempt to access a deduplicated file the content of which is in the central file store and is not present on the specific virtual machine, using a corresponding hash stored locally on the specific virtual machine in association with the specific file to retrieve the content of the specific file from the central file store; in response to a specific virtual machine updating a deduplicated file the content of which is in the central file store and is not present on the specific virtual machine;
storing the updated file on the specific virtual machine and not in the central file store; and
deleting the hash stored in association with the updated file from the specific virtual machine; andin response to a specific virtual machine deleting a deduplicated file the content of which is in the central file store and is not present on the specific virtual machine;
deleting the hash stored in association with the specific file from the specific virtual machine and deleting an entry for the specific file from the file system of the specific virtual machine. - View Dependent Claims (2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12)
-
-
13. At least one non-transitory computer readable-storage medium for enabling file deduplication and scan reduction across multiple virtual machines in a virtualization environment of a single host computer, the at least one non-transitory computer readable-storage medium storing computer executable instructions that, when loaded into computer memory and executed by at least one processor of a computing device, cause the computing device to perform the following steps:
-
creating a virtual machine template based upon which to create multiple virtual machines on the host computer, wherein the virtual machine template comprises a description of a virtual machine in a known good state having a file system containing at least some files to deduplicate across the multiple virtual machines; for each specific file in the file system of the virtual machine template to deduplicate across the multiple virtual machines, deduplicating the specific file by;
generating a hash of content of the specific file;
storing the generated hash locally on the virtual machine template in association with the specific file; and
moving the content of the specific file from the virtual machine template to a central file store residing independently of the virtual machine template and the multiple virtual machines;creating multiple virtual machines by cloning the virtual machine template, wherein each one of the multiple virtual machines cloned from the virtual machine template contains a copy of the file system of the virtual machine template and a copy of the generated hashes of the content of the deduplicated files, the copy of the hashes being stored locally on the specific virtual machine in association with the corresponding deduplicated files; monitoring file access operations on each one of the multiple virtual machines cloned from the virtual machine template; on each one of the multiple virtual machines cloned from the virtual machine template, in response to detecting an attempt to access a deduplicated file the content of which is in the central file store and is not present on the specific virtual machine, using a corresponding hash stored locally on the specific virtual machine in association with the specific file to retrieve the content of the specific file from the central file store; and in response to a specific virtual machine updating a deduplicated file the content of which is in the central file store and is not present on the specific virtual machine;
storing the updated file on the specific virtual machine and not in the central file store; and
deleting the hash stored in association with the updated file from the specific virtual machine; andin response to a specific virtual machine deleting a deduplicated file the content of which is in the central file store and is not present on the specific virtual machine;
deleting the hash stored in association with the specific file from the specific virtual machine and deleting an entry for the specific file from the file system of the specific virtual machine. - View Dependent Claims (14, 15)
-
-
16. A computer implemented method for enabling file deduplication and scan reduction across multiple virtual machines in a virtualization environment of a single host computer, the method comprising the steps of:
-
creating a virtual machine template based upon which to create multiple virtual machines on the host computer, wherein the virtual machine template comprises a description of a virtual machine in a known good state having a file system containing at least some files to deduplicate across the multiple virtual machines; for each specific file in the file system of the virtual machine template to deduplicate across the multiple virtual machines, deduplicating the specific file by;
generating a hash of content of the specific file;
storing the generated hash locally on the virtual machine template in association with the specific file; and
moving the content of the specific file from the virtual machine template to a central file store residing independently of the virtual machine template and the multiple virtual machines;creating multiple virtual machines by cloning the virtual machine template, wherein each one of the multiple virtual machines cloned from the virtual machine template contains a copy of the file system of the virtual machine template and a copy of the generated hashes of the content of the deduplicated files, the copy of the hashes being stored locally on the specific virtual machine in association with the corresponding deduplicated files; monitoring file access operations on each one of the multiple virtual machines cloned from the virtual machine template; on each one of the multiple virtual machines cloned from the virtual machine template, in response to detecting an attempt to access a deduplicated file the content of which is in the central file store and is not present on the specific virtual machine, using a corresponding hash stored locally on the specific virtual machine in association with the specific file to retrieve the content of the specific file from the central file store; and in response to a specific virtual machine updating a deduplicated file the content of which is in the central file store and is not present on the specific virtual machine;
storing a delta from original content to updated content of the deduplicated file in the central file store;
generating a hash of the updated content of the specific file; and
storing the generated hash of the updated content on the specific virtual machine in association with the specific file. - View Dependent Claims (17, 18, 19, 20)
-
Specification