SYSTEM AND METHOD FOR ON-THE-FLY ELIMINATION OF REDUNDANT DATA
First Claim
1. A method for elimination of redundant data, the method comprising the steps of:
- receiving a set of data;
partitioning the set of data into one or more blocks;
generating a fingerprint for each of the one or more blocks;
determining whether a previously stored block has a fingerprint one of the generated fingerprints;
in response to determining that a previously stored block has a matching fingerprint, identifying a location of the previously stored block in a block store and updating indexing information with the location information of the block; and
in response to determining that there is not a previously stored block having a matching fingerprint, storing the identified block in the block store and updating indexing information with a location of the stored block in the block store.
4 Assignments
0 Petitions
Accused Products
Abstract
A system and method for “on-the-fly” de-duplication of data before storing the data in a storage system is provided. A data de-duplication module illustratively cooperates with protocol servers and a file system of a storage operating system executing on the storage system to implement the novel de-duplication technique. The de-duplication module illustratively generates a block store, an index file and a hash table on storage space provided by the storage system. The hash table is utilized for tracking fingerprints and locations of blocks within the block store. The index file is utilized for storing directory information identifying the contents of data containers stored on the storage system, while the block store is utilized to store raw data blocks that comprise the data containers.
-
Citations
23 Claims
-
1. A method for elimination of redundant data, the method comprising the steps of:
-
receiving a set of data; partitioning the set of data into one or more blocks; generating a fingerprint for each of the one or more blocks; determining whether a previously stored block has a fingerprint one of the generated fingerprints; in response to determining that a previously stored block has a matching fingerprint, identifying a location of the previously stored block in a block store and updating indexing information with the location information of the block; and in response to determining that there is not a previously stored block having a matching fingerprint, storing the identified block in the block store and updating indexing information with a location of the stored block in the block store. - View Dependent Claims (2, 3, 4, 5, 6, 7, 8, 9, 10, 11)
-
-
12. A system for a limitation of redundant data, the system comprising:
-
a computer executing a data de-duplication module, the de-duplication module managing a block store, an index and a hash table; and wherein the de-duplication module is configured to, in response to receiving a set of data, (i) partition the set of data into one or more blocks, (ii) generate a fingerprint for each of the one or more blocks and (iii) in response to determining that a block has not been previously stored within the block store, store the block within the block store. - View Dependent Claims (13, 14, 15, 16, 17, 18, 19, 20)
-
-
21. A system for elimination of redundant data, the system comprising:
-
means for receiving a set of data; means for partitioning the set of data into one or more blocks; means for generating a fingerprint for each of the one or more blocks; means for determining whether a previously stored block has a fingerprint matching one of the generated fingerprints; in response to determining that a previously stored block has an identical fingerprint, identifying a location of the previously stored block in a block store and updating indexing information with the location information of the block; and in response to determining that there is not a previously stored block having an identical fingerprint, means for storing the identified block in the block store and updating indexing information with a location of the stored block in the block store.
-
-
22. A computer readable medium for elimination of redundant data, the computer readable medium including instructions for performing the steps of:
-
receiving a set of data; partitioning the set of data into one or more blocks; generating a fingerprint for each of the one or more blocks; determining whether a previously stored block has a fingerprint matching one of the generated fingerprints; in response to determining that a previously stored block has an identical fingerprint, identifying a location of the previously stored block in a block store and updating indexing information with the location information of the block; and in response to determining that there is not a previously stored block having an identical fingerprint, storing the identified block in the block store and updating indexing information with a location of the stored block in the block store. - View Dependent Claims (23)
-
Specification