System performing data deduplication using a dense tree data structure
First Claim
1. A method, comprising:
- generating, by a processor, a fingerprint identifying data to be stored on one or more storage devices;
inserting the fingerprint into a first level of a dense tree data structure having a plurality of levels;
initiating a merge operation between the first level and a second level of the dense tree data structure based on the first level of the dense tree being filled to a threshold capacity, wherein the first level of the dense tree stores a first set of fingerprints and the second level stores a second set of fingerprints; and
in response to initiating the merge operation, comparing the first set of fingerprints stored in the first level with the second set of fingerprints stored in the second level of the dense tree data structure to identify one or more duplicate fingerprints and performing data deduplication for selected data, corresponding to the one or more identified duplicate fingerprints, stored on the one or more storage devices.
1 Assignment
0 Petitions
Accused Products
Abstract
In one embodiment, as new blocks of data are written to storage devices of a storage system, fingerprints are generated for those new blocks and inserted as entries into a top level (L0) of a dense tree data structure. When L0 is filled, the contents from L0 may be merged with level 1 (L1). After the initial merge, new fingerprints are added to L0 until L0 fills up again, which triggers a new merge. Duplicate fingerprints in L0 and L1 are identified which, in turn, indicates duplicate data blocks. A post-processing deduplication operation is then performed to remove duplicate data blocks corresponding to the duplicate fingerprints. In a different embodiment, as new fingerprint entries are loaded into L0, those new fingerprints may be compared with existing fingerprints loaded into L0 and/or other levels to facilitate inline deduplication to identify duplicate fingerprints and subsequently perform the deduplication operation.
-
Citations
19 Claims
-
1. A method, comprising:
-
generating, by a processor, a fingerprint identifying data to be stored on one or more storage devices; inserting the fingerprint into a first level of a dense tree data structure having a plurality of levels; initiating a merge operation between the first level and a second level of the dense tree data structure based on the first level of the dense tree being filled to a threshold capacity, wherein the first level of the dense tree stores a first set of fingerprints and the second level stores a second set of fingerprints; and in response to initiating the merge operation, comparing the first set of fingerprints stored in the first level with the second set of fingerprints stored in the second level of the dense tree data structure to identify one or more duplicate fingerprints and performing data deduplication for selected data, corresponding to the one or more identified duplicate fingerprints, stored on the one or more storage devices. - View Dependent Claims (2, 3, 4, 5)
-
-
6. A method comprising:
- generating, by a processor, a fingerprint identifying data to be stored on one or more storage devices;
comparing the fingerprint with other fingerprints stored in a first level of a dense tree data structure having a plurality of levels, to identify a duplicate fingerprint;
in response to identifying the duplicate fingerprint in the dense tree data structure, performing data deduplication associated with the identified data stored on the one or more storage devices, wherein the identified data corresponds to the identified duplicate fingerprint, and storing the fingerprint in the first level of the dense tree data structure; and
in response to not identifying the duplicate fingerprint in the dense tree data structure, storing the fingerprint in the first level of the dense tree data structure. - View Dependent Claims (7, 8, 9, 10, 11)
- generating, by a processor, a fingerprint identifying data to be stored on one or more storage devices;
-
12. A system, comprising:
-
a processor configured to execute one or more processes; and a memory configured to store a process executable by the processor, the process configured to; generate a fingerprint identifying data to be stored on one or more storage devices; insert the fingerprint into a first level of a dense tree data structure having a plurality of levels; initiate a merge operation between the first level and a second level of the dense tree data structure in response to the first level being filled to a threshold capacity, wherein the first level of the dense tree stores a first set of fingerprints and the second level stores a second set of fingerprints; compare the first set of fingerprints stored in the first level and with the second set of fingerprints stored in the second level of the dense tree data structure to identify one or more duplicate fingerprints in response to initiating the merge operation; and perform data deduplication for selected data, corresponding to the one or more identified duplicate fingerprints, stored on the one or more storage device. - View Dependent Claims (13, 14, 15, 16, 17, 18, 19)
-
Specification