DICTIONARY FOR DATA DEDUPLICATION
First Claim
Patent Images
1. A method, comprising:
- identifying a segment in a file for deduplication;
generating a strong a strong hash value and a weak hash value for the segment;
comparing the weak hash value to a plurality of weak hash values maintained in a deduplication dictionary;
reading metadata from a segment entry corresponding to the weak hash value;
comparing the strong hash value to a stored strong hash value maintained in the segment entry.
17 Assignments
0 Petitions
Accused Products
Abstract
Mechanisms are provided for efficiently improving a dictionary used for data deduplication. Dictionaries are used to hold hash key and location pairs for deduplicated data. Strong hash keys prevent collisions but weak hash keys are more computation and storage efficient. Mechanisms are provided to use both a weak hash key and a strong hash key. Weak hash keys and corresponding location pairs are stored in an improved dictionary while strong hash keys are maintained with the deduplicated data itself. The need for having uniqueness from a strong hash function is balanced with the deduplication dictionary space savings from a weak hash function.
100 Citations
20 Claims
-
1. A method, comprising:
-
identifying a segment in a file for deduplication; generating a strong a strong hash value and a weak hash value for the segment; comparing the weak hash value to a plurality of weak hash values maintained in a deduplication dictionary; reading metadata from a segment entry corresponding to the weak hash value; comparing the strong hash value to a stored strong hash value maintained in the segment entry. - View Dependent Claims (2, 3, 4, 5, 6, 7, 8, 9, 10)
-
-
11. A system, comprising:
-
an interface configured to receive a segment in a file for deduplication; a processor configured to generate a strong a strong hash value and a weak hash value for the segment and compare the weak hash value to a plurality of weak hash values maintained in a deduplication dictionary; wherein metadata from a segment entry corresponding to the weak hash value is read and the strong hash value is compared to a stored strong hash value maintained in the segment entry. - View Dependent Claims (12, 13, 14, 15, 16, 17, 18, 19)
-
-
20. A computer readable storage medium having computer code embodied therein, the computer readable storage medium, comprising:
-
computer code for identifying a segment in a file for deduplication; computer code for generating a strong a strong hash value and a weak hash value for the segment; computer code for comparing the weak hash value to a plurality of weak hash values maintained in a deduplication dictionary; computer code for reading metadata from a segment entry corresponding to the weak hash value; computer code for comparing the strong hash value to a stored strong hash value maintained in the segment entry.
-
Specification