Deduplicating storage with enhanced frequent-block detection
First Claim
1. A method for detecting data duplication, comprising:
- maintaining a fingerprint directory comprising one or more entries, each entry including a data fingerprint and a data location for a data chunk;
maintaining a shadow list comprising a record of fingerprint values removed from the fingerprint directory, wherein the shadow list comprises an allocation of resources with associated methods to insert fingerprints and to look up fingerprints, and to return a result of a lookup in the shadow list;
associating each said entry with a seen-count attribute which is an indication of how often a data fingerprint has been seen in arriving data chunks to be written in a storage system, and distinguishes multiply-seen entries for data fingerprints present in at least two data chunks from once-seen entries for data fingerprints present in no more than a single data chunk; and
retrieving entries from the shadow list such that each entry retrieved from the shadow list comprises twice-seen fingerprints.
1 Assignment
0 Petitions
Accused Products
Abstract
Detecting data duplication includes maintaining a fingerprint directory including one or more entries. Each entry includes a data fingerprint and a data location for a data chunk. A shadow list including a record of fingerprint values not contained in the fingerprint directory is maintained. Each entry is associated with a seen-count attribute, which is an indication of how often a data fingerprint has been seen in arriving data chunks to be written in a storage system, and distinguishes multiply-seen entries for data fingerprints present in at least two data chunks from once-seen entries for data fingerprints present in no more than a single data chunk. Each entry retrieved from the shadow list relates to twice-seen fingerprints.
33 Citations
20 Claims
-
1. A method for detecting data duplication, comprising:
-
maintaining a fingerprint directory comprising one or more entries, each entry including a data fingerprint and a data location for a data chunk; maintaining a shadow list comprising a record of fingerprint values removed from the fingerprint directory, wherein the shadow list comprises an allocation of resources with associated methods to insert fingerprints and to look up fingerprints, and to return a result of a lookup in the shadow list; associating each said entry with a seen-count attribute which is an indication of how often a data fingerprint has been seen in arriving data chunks to be written in a storage system, and distinguishes multiply-seen entries for data fingerprints present in at least two data chunks from once-seen entries for data fingerprints present in no more than a single data chunk; and retrieving entries from the shadow list such that each entry retrieved from the shadow list comprises twice-seen fingerprints. - View Dependent Claims (2, 3, 4, 5, 6, 7, 8)
-
-
9. A computer program product for detecting data duplication, the computer program product comprising a computer readable storage medium having program instructions embodied therewith, the program instructions executable by a processor to cause the processor to:
-
maintain, by the processor, a fingerprint directory comprising one or more entries, each entry including a data fingerprint and a data location for a data chunk; maintain, by the processor, a shadow list comprising a record of fingerprint values removed from the fingerprint directory, wherein the shadow list comprises an allocation of resources with associated methods to insert fingerprints and to look up fingerprints, and to return a result of a lookup in the shadow list; associate, by the processor, each said entry with a seen-count attribute which is an indication of how often a data fingerprint has been seen in arriving data chunks to be written in a storage system, and distinguishes multiply-seen entries for data fingerprints present in at least two data chunks from once-seen entries for data fingerprints present in no more than a single data chunk; and retrieve, by the processor, entries from the shadow list such that each entry retrieved from the shadow list comprises twice-seen fingerprints. - View Dependent Claims (10, 11, 12)
-
-
13. A system for detecting data duplication, comprising:
-
a memory device; a fingerprint controller coupled to the memory device, the fingerprint controller configured to; maintain a fingerprint directory comprising one or more entries, each entry including a data fingerprint and a data location for a data chunk, and associate each said entry with a seen-count attribute which is an indication of how often a data fingerprint has been seen in arriving data chunks to be written in a storage system, and distinguishes multiply-seen entries for data fingerprints present in at least two data chunks from once-seen entries for data fingerprints present in no more than a single data chunk; and a shadow list controller configured to maintain a shadow list comprising a record of fingerprint values removed from the fingerprint directory, and to retrieve entries from the shadow list such that each entry retrieved from the shadow list comprises twice-seen fingerprints, wherein the shadow list comprises an allocation of system resources with associated methods to insert fingerprints and to look up fingerprints, and to return a result of a lookup in the shadow list. - View Dependent Claims (14, 15, 16, 17, 18, 19, 20)
-
Specification