File Deduplication using Copy-on-Write Storage Tiers
First Claim
1. A method of deduplicating files from a primary storage tier by a file virtualization appliance in a file storage system, the method comprising:
- associating a number of files from the primary storage tier with a copy-on-write storage tier having a designated mirror server; and
deduplicating the files associated with the copy-on-write storage tier, such deduplicating including;
storing in the designated mirror server a single copy of the file contents for each duplicate and non-duplicate file associated with the copy-on-write storage tier;
deleting the file contents from each deduplicated file in the copy-on-write storage tier to leave a sparse file; and
storing metadata for each of the files, the metadata associating each sparse file with the corresponding single copy of the file contents stored in the designated mirror server.
5 Assignments
0 Petitions
Accused Products
Abstract
A method and apparatus for removing duplicated data in a file system utilizing copy-on-write storage tiers. A synthetic namespace is created via file virtualization, and is comprised of one or more file systems. Deduplication is applied at the namespace level and on all of the file systems comprising the synthetic namespace. A set of storage policies selects a set of files from the namespace that become the candidates for deduplication. The entire chosen set is migrated to a Copy-On-Write (COW) storage tier. This Copy-On-Write storage tier may be a virtual storage tier that resides within another physical storage tier (such as tier-1 or tier-2 storage). Each file stored in a Copy-On-Write storage tier is deduped, regardless of whether there is any file with identical contents in the set or in the COW storage tier. After deduplication, the deduped file becomes a sparse file where all the files storage space is reclaimed while all the file'"'"'s attributes, including size, remain. A copy of each file that is deduped is left as a mirror copy and is stored in a mirror server. If two mirror copies have identical contents, only one mirror copy will be stored in the mirror server. Read access to a file in the COW storage tier (COW file) is redirected to its mirror copy if the file is deduped. When the first write to a COW file is received, the mirror copy stored in the mirror server is copied as the contents of the COW file, and the association from the COW file to its mirror copy is discarded. Thereafter, access to the “un-deduped” file will resume normally from the COW file.
236 Citations
30 Claims
-
1. A method of deduplicating files from a primary storage tier by a file virtualization appliance in a file storage system, the method comprising:
-
associating a number of files from the primary storage tier with a copy-on-write storage tier having a designated mirror server; and deduplicating the files associated with the copy-on-write storage tier, such deduplicating including; storing in the designated mirror server a single copy of the file contents for each duplicate and non-duplicate file associated with the copy-on-write storage tier; deleting the file contents from each deduplicated file in the copy-on-write storage tier to leave a sparse file; and storing metadata for each of the files, the metadata associating each sparse file with the corresponding single copy of the file contents stored in the designated mirror server. - View Dependent Claims (2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15)
-
-
16. A file virtualization appliance for deduplicating files from a primary storage tier in a file storage system, the file virtualization appliance comprising:
-
a network interface for communication with the file servers; and a processor coupled to the network interface and configured to associate a number of files from the primary storage tier with a copy-on-write storage tier having a designated mirror server and to deduplicate the files associated with the copy-on-write storage tier, such deduplicating including; storing in the designated mirror server a single copy of the file contents for each duplicate and non-duplicate file associated with the copy-on-write storage tier; deleting the file contents from each deduplicated file in the copy-on-write storage tier to leave a sparse file; and storing metadata for each of the files, the metadata associating each sparse file with the corresponding single copy of the file contents stored in the designated mirror server. - View Dependent Claims (17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28, 29, 30)
-
Specification