File deduplication using storage tiers
First Claim
1. A method of deduplicating files, the method comprising:
- accessing, with a file virtualization device, a virtualized environment including one or more primary storage servers operating as a primary storage tier and storing a first plurality of files and one or more secondary storage servers operating as a secondary storage tier and storing a second plurality of files comprising at least a plurality of files not included in the first plurality of files, wherein a global namespace is associated with the first and second pluralities of files stored in the one or more primary and secondary storage servers;
identifying, with the file virtualization device, a subset of the first plurality of files that are stored in the primary storage tier and have identical file contents and storing a copy of only the subset of files in the secondary storage tier;
storing, with the file virtualization device, a single copy of the contents of each of the subset of files in the primary storage tier and deleting all other files having identical file contents from the primary storage tier; and
storing, with the file virtualization device, metadata associating each of the copies of the subset of files stored in the secondary storage tier with a corresponding one of the single copies stored in the primary storage tier.
7 Assignments
0 Petitions
Accused Products
Abstract
A method and apparatus for removing duplicated data in a file system utilizing the concept of storage tiers. A synthetic namespace is created via file virtualization, and is comprised of one or more file systems. Deduplication is applied at the namespace level and on all of the file systems comprising the synthetic namespace. All files in a file system in a higher storage tier whose contents are identical to at least one other file in the synthetic namespace are moved to a destination file system in a lower storage tier. For each set of duplicated files that are moved from the original servers, a single instance copy of the file is left behind as a mirror copy. Read access to a duplicated file is redirected to its mirror copy. When the first write to a duplicated file is received, the association from the duplicated file stored in the destination server to its mirror copy that is stored in the origin server is discarded. Access to the “modified” duplicated file will then resume normally from the destination server.
407 Citations
28 Claims
-
1. A method of deduplicating files, the method comprising:
-
accessing, with a file virtualization device, a virtualized environment including one or more primary storage servers operating as a primary storage tier and storing a first plurality of files and one or more secondary storage servers operating as a secondary storage tier and storing a second plurality of files comprising at least a plurality of files not included in the first plurality of files, wherein a global namespace is associated with the first and second pluralities of files stored in the one or more primary and secondary storage servers; identifying, with the file virtualization device, a subset of the first plurality of files that are stored in the primary storage tier and have identical file contents and storing a copy of only the subset of files in the secondary storage tier; storing, with the file virtualization device, a single copy of the contents of each of the subset of files in the primary storage tier and deleting all other files having identical file contents from the primary storage tier; and storing, with the file virtualization device, metadata associating each of the copies of the subset of files stored in the secondary storage tier with a corresponding one of the single copies stored in the primary storage tier. - View Dependent Claims (2, 3, 4, 5, 6, 7)
-
-
8. A virtualization apparatus for deduplicating files, the apparatus comprising:
-
at least one communication interface for communicating with one or more primary and secondary storage servers; and at least one of configurable hardware logic configured to be capable of implementing or a processor configured to execute program instructions stored in a memory comprising; accessing a virtualized environment including the one or more primary storage servers operating as a primary storage tier and storing a first plurality of files and the one or more secondary storage servers operating as a secondary storage tier and storing a second plurality of files comprising at least a plurality of files not included in the first plurality of files, wherein a global namespace is associated with the first and second pluralities of files stored in the one or more primary and secondary storage servers; identifying a subset of the accessed files that are stored in the primary storage tier and have identical file contents and storing a copy of only the subset of files in the secondary storage tier; storing a single copy of the contents of each of the subset of files in the primary storage tier and deleting all other files having identical file contents from the primary storage tier; and storing metadata associating each of the copies of the subset of files stored in the secondary storage tier with a corresponding one of the single copies stored in the primary storage tier. - View Dependent Claims (9, 10, 11, 12, 13, 14)
-
-
15. A system that deduplicates files, the system comprising:
-
one or more primary storage servers operating as a primary storage tier and storing a first plurality of files and one or more secondary storage servers operating as a primary storage tier and storing a second plurality of files comprising at least a plurality of files not included in the first plurality of files, the storage servers storing the first and second pluralities of files in a virtualized environment, wherein a global namespace is associated with the first and second pluralities of files stored in the one or more primary and secondary storage servers; a file virtualization device including at least one of configurable hardware logic configured to be capable of implementing or a processor configured to execute program instructions stored in a memory comprising; identifying a subset of the plurality of files that are stored in the primary storage tier and have identical file contents and storing a copy of only the subset of files in the secondary storage tier; storing a single copy of the contents of each of the subset of files in the primary storage tier and deleting all other files having identical file contents from the primary storage tier; and storing metadata associating each of the copies of the subset of files stored in the secondary storage tier with a corresponding one of the single copies stored in the primary storage tier. - View Dependent Claims (16, 17, 18, 19, 20, 21)
-
-
22. A non-transitory computer readable medium having stored thereon instructions for deduplicating files comprising machine executable code which when executed by at least one processor, causes the processor to perform steps comprising:
-
accessing a plurality of files stored in a virtualized environment including one or more primary storage servers operating as a primary storage tier and storing a first plurality of files and one or more secondary storage servers operating as a secondary storage tier and storing a second plurality of files comprising at least a plurality of files not included in the first plurality of files, wherein a global namespace is associated with the first and second pluralities of files stored in the one or more primary and secondary storage servers; identifying a subset of the accessed files that are stored in the primary storage tier and have identical file contents and storing a copy of only the subset of files in the secondary storage tier; storing a single copy of the contents of each of the subset of files in the primary storage tier and deleting all other files having identical file contents from the primary storage tier; and storing metadata associating each of the copies of the subset of files stored in the secondary storage tier with a corresponding one of the single copies stored in the primary storage tier. - View Dependent Claims (23, 24, 25, 26, 27, 28)
-
Specification