Global de-duplication in shared architectures
First Claim
1. A method for globally de-duplicating data inline in a shared architecture, the method comprising:
- receiving a digital sequence for storage on a first storage system in a network that includes the first storage system and one or more additional storage systems, wherein the first storage system and each of the one or more additional storage systems include a de-duplication client, wherein the first storage system includes original data and at least a snapshot of the original data;
determining that the digital sequence includes at least one block of data that is not stored in the first storage system by the de-duplication client of the first storage system;
determining that the at least one block of data is a duplicate of a block of data already stored on one of the one or more additional storage systems, wherein the de-duplication client of the first storage system cooperates with a de-duplication server to determine that the at least one block of data is a duplicate of a block of data already stored on one of the one or more additional storage systems; and
storing, on the first storage system, a pointer or reference that points to the block of data already stored on the one of the one or more additional storage systems, wherein the at least one block of data is not stored on the first storage system, wherein a single instance of the at least one block of data is used for the original data and the snapshot in the first storage system and in the one or more additional storage systems.
9 Assignments
0 Petitions
Accused Products
Abstract
Redundant data is globally de-duplicated across a shared architecture that includes a plurality of storage systems. The storage systems implement copy-on-write or WAFL to generate snapshots of original data. Each storage system includes a de-duplication client to identify and reduce redundant original and/or snapshot data on the storage system. Each de-duplication client can de-duplicate a digital sequence by breaking the sequence into blocks and identifying redundant blocks already stored in the shared architecture. Identifying redundant blocks may include hashing each block and comparing the hash to a local and/or master hash table containing hashes of existing data. Once identified, redundant data previously stored is deleted (e.g., post-process de-duplication), or redundant data is not stored to begin with (e.g., inline de-duplication). In both cases, pointers to shared data blocks can be used to reassemble the digital sequence where one or more blocks were deleted or not stored on the storage system.
152 Citations
20 Claims
-
1. A method for globally de-duplicating data inline in a shared architecture, the method comprising:
-
receiving a digital sequence for storage on a first storage system in a network that includes the first storage system and one or more additional storage systems, wherein the first storage system and each of the one or more additional storage systems include a de-duplication client, wherein the first storage system includes original data and at least a snapshot of the original data; determining that the digital sequence includes at least one block of data that is not stored in the first storage system by the de-duplication client of the first storage system; determining that the at least one block of data is a duplicate of a block of data already stored on one of the one or more additional storage systems, wherein the de-duplication client of the first storage system cooperates with a de-duplication server to determine that the at least one block of data is a duplicate of a block of data already stored on one of the one or more additional storage systems; and storing, on the first storage system, a pointer or reference that points to the block of data already stored on the one of the one or more additional storage systems, wherein the at least one block of data is not stored on the first storage system, wherein a single instance of the at least one block of data is used for the original data and the snapshot in the first storage system and in the one or more additional storage systems. - View Dependent Claims (2, 3, 4, 5, 6, 7)
-
-
8. A method for globally de-duplicating data post-process in a shared architecture, the method comprising:
-
storing a digital sequence on a first storage system in a network that includes the first storage system and one or more additional storage systems, wherein each of the first storage system and the one or more additional storage systems include a de-duplication client, wherein the first storage system includes original data and at least a snapshot of the original data; determining that the digital sequence includes at least one block of data that is not already stored in the first storage system by the de-duplication client of the first storage system; determining that the at least one block of data is a duplicate of a block of data stored on one of the one or more additional storage systems, wherein the de-duplication client of the first storage system cooperates with a de-duplication server to determine that the at least one block of data is a duplicate of a block of data already stored on one of the one or more additional storage systems; deleting the at least one block of data from the first storage system; and storing, on the first storage system, a pointer or reference that points to the block of data stored on the one of the one or more additional storage systems, wherein a single instance of the at least one block of data is used for the original data and the snapshot in the first storage system. - View Dependent Claims (9, 10, 11, 12, 13, 14)
-
-
15. A system for reducing redundant data across a plurality of storage systems, the system comprising:
-
a de-duplication server maintaining a master table or index of data stored on a plurality of storage systems; and a plurality of de-duplication clients each operating on a corresponding one of the plurality of storage systems to de-duplicate redundant data either stored on or being written to a corresponding storage system relative to data already stored in the plurality of storage systems, wherein the plurality of storage systems includes original data and at least one snapshot of the original data wherein each de-duplication client maintains a local table or index of data for the corresponding storage system, wherein each de-duplication client uses the local table and each de-duplication client coordinates with the de-duplication server to use the master table to de-duplicate the redundant data across the plurality of storage systems, wherein a pointer or reference is used to point to data on the other storage systems when data is determined to be redundant, wherein a single instance of the each block of the data that has been de-duplicated is used for the original data and the snapshots across the plurality of storage systems. - View Dependent Claims (16, 17, 18, 19, 20)
-
Specification