Deduplicating storage with enhanced frequent-block detection
First Claim
Patent Images
1. A method for detecting data duplication, comprising:
- maintaining a fingerprint directory comprising one or more entries, each entry including a data fingerprint and a data location for a data chunk;
associating each said entry with a seen-count attribute which is an indication of how often a data fingerprint has been seen in arriving data chunks to be written in a storage system, and distinguishes multiply-seen entries for data fingerprints present in at least two data chunks from once-seen entries for data fingerprints present in no more than a single data chunk;
retaining higher-frequency entries, while also taking into account recency of data accesses for the higher-frequency entries based on the seen-count attribute and the data access age; and
detecting that the data fingerprint for a new chunk is the same as the data fingerprint contained in an entry in the fingerprint directory,wherein a policy is applied for distinguishing multiple seen-count categories based on tracking data access ages of entries in the fingerprint directory for different seen-count categories.
1 Assignment
0 Petitions
Accused Products
Abstract
Detecting data duplication comprises maintaining a fingerprint directory including one or more entries, each entry including a data fingerprint and a data location for a data chunk. Each entry is associated with a seen-count attribute which is an indication of how often the fingerprint has been seen in arriving data chunks. Higher-frequency entries in the directory are retained, while also taking into account recency of data accesses. A data duplication detector detects that the data fingerprint for a new chunk is the same as the data fingerprint contained in an entry in the fingerprint directory.
41 Citations
20 Claims
-
1. A method for detecting data duplication, comprising:
-
maintaining a fingerprint directory comprising one or more entries, each entry including a data fingerprint and a data location for a data chunk; associating each said entry with a seen-count attribute which is an indication of how often a data fingerprint has been seen in arriving data chunks to be written in a storage system, and distinguishes multiply-seen entries for data fingerprints present in at least two data chunks from once-seen entries for data fingerprints present in no more than a single data chunk; retaining higher-frequency entries, while also taking into account recency of data accesses for the higher-frequency entries based on the seen-count attribute and the data access age; and detecting that the data fingerprint for a new chunk is the same as the data fingerprint contained in an entry in the fingerprint directory, wherein a policy is applied for distinguishing multiple seen-count categories based on tracking data access ages of entries in the fingerprint directory for different seen-count categories. - View Dependent Claims (2, 3, 4, 5, 6, 7, 8)
-
-
9. A computer program product for detecting data duplication, the computer program product comprising:
-
a non-transitory tangible storage medium readable by a computer system and storing instructions for execution by the computer system for performing a method comprising; maintaining a fingerprint directory comprising one or more entries, each entry including a data fingerprint and a data location for a data chunk; associating each said entry with a seen-count attribute which is an indication of how often the data fingerprint has been seen in arriving data chunks to be written in a storage system and is used for distinguishing multiply-seen entries for data fingerprints present in at least two data chunks from once-seen entries for data fingerprints present in no more than a single data chunk; retaining higher-frequency entries, while also taking into account recency of data accesses for the higher-frequency entries based on the seen-count attribute and data access age; and detecting that the data fingerprint for a new chunk is the same as the data fingerprint contained in an entry in the fingerprint directory, wherein a policy is applied for distinguishing multiple seen-count categories based on tracking data access ages of entries in the fingerprint directory for different seen-count categories. - View Dependent Claims (10, 11, 12)
-
-
13. A system for detecting data duplication, comprising:
-
a memory device; a fingerprint controller coupled to the memory device, the fingerprint controller maintains a fingerprint directory comprising one or more entries, each entry including a data fingerprint and a data location for a data chunk in a storage device; wherein each entry is associated with a seen-count attribute which is an indication of how often the fingerprint has been seen in arriving data chunks to be written in the system, and distinguishes multiply-seen entries for data fingerprints present in at least two data chunks from once-seen entries for data fingerprints present in no more than a single data chunk, and wherein the fingerprint controller retains higher-frequency entries, while also taking into account recency of data accesses for the higher-frequency entries based on the seen-count attribute and data access age; and a duplicate detector that detects if the data fingerprint for a new chunk is the same as the data fingerprint contained in an entry in the fingerprint directory, wherein a policy is applied for distinguishing multiple seen-count categories based on tracking data access ages of entries in the fingerprint directory for different seen-count categories. - View Dependent Claims (14, 15, 16, 17, 18, 19, 20)
-
Specification