DEDUPLICATING DATA AT SUB-BLOCK GRANULARITY
First Claim
Patent Images
1. A method of performing data deduplication, the method comprising:
- receiving, by a data storage system, an I/O (Input/Output) request that specifies a write of a set of data to the data storage system, the data storage system defining a candidate block from at least a portion of the set of data, the candidate block including multiple uniformly-sized sub-blocks, the sub-blocks including a candidate sub-block;
searching a deduplication database for a target sub-block that matches the candidate sub-block; and
in response to finding a matching entry in the deduplication database for the target sub-block, (i) accessing, based on a reference stored in the matching entry, a previously-stored target block that contains the target sub-block, (ii) identifying a shared range between the candidate block and the target block for which a duplicate range RDUP of the candidate block matches a target range RTARG of the target block, and (iii) effecting persistent storage of the duplicate range RDUP of the candidate block by configuring mapping metadata of the candidate block to reference the target range RTARG of the target block.
3 Assignments
0 Petitions
Accused Products
Abstract
A technique for performing data deduplication operates at sub-block granularity by searching a deduplication database for a match between a candidate sub-block of a candidate block and a target sub-block of a previously-stored target block. When a match is found, the technique identifies a duplicate range shared between the candidate block and the target block and effects persistent storage of the duplicate range by configuring mapping metadata of the candidate block so that it points to the duplicate range in the target block.
5 Citations
22 Claims
-
1. A method of performing data deduplication, the method comprising:
-
receiving, by a data storage system, an I/O (Input/Output) request that specifies a write of a set of data to the data storage system, the data storage system defining a candidate block from at least a portion of the set of data, the candidate block including multiple uniformly-sized sub-blocks, the sub-blocks including a candidate sub-block; searching a deduplication database for a target sub-block that matches the candidate sub-block; and in response to finding a matching entry in the deduplication database for the target sub-block, (i) accessing, based on a reference stored in the matching entry, a previously-stored target block that contains the target sub-block, (ii) identifying a shared range between the candidate block and the target block for which a duplicate range RDUP of the candidate block matches a target range RTARG of the target block, and (iii) effecting persistent storage of the duplicate range RDUP of the candidate block by configuring mapping metadata of the candidate block to reference the target range RTARG of the target block. - View Dependent Claims (2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 22)
-
-
19. A data storage system, comprising control circuitry that includes a set of processing units coupled to memory, the control circuitry constructed and arranged to:
-
define, by a data storage system, multiple uniformly-sized sub-blocks of a candidate block, the sub-blocks including a candidate sub-block; search a deduplication database for a target sub-block that matches the candidate sub-block; and in response to finding a matching entry in the deduplication database for the target sub-block, (i) access, by following a pointer stored in the matching entry, a previously-stored target block that contains the target sub-block, (ii) identify a shared range between the candidate block and the target block for which a duplicate range RDUP of the candidate block matches a target range RTARG of the target block, and (iii) effect persistent storage of the duplicate range RDUP of the candidate block by configuring mapping metadata of the candidate block to reference the target range RTARG of the target block.
-
-
20. A computer program product including a set of non-transitory, computer-readable media having instructions which, when executed by control circuitry of a data storage system, cause the control circuitry to perform a method of performing data deduplication, the method comprising:
-
defining, by a data storage system, multiple uniformly-sized sub-blocks of a candidate block, the sub-blocks including a candidate sub-block; searching a deduplication database for a target sub-block that matches the candidate sub-block; and in response to finding a matching entry in the deduplication database for the target sub-block, (i) accessing, by following a pointer stored in the matching entry, a previously-stored target block that contains the target sub-block, (ii) identifying a shared range between the candidate block and the target block for which a duplicate range RDUP of the candidate block matches a target range RTARG of the target block, and (iii) effecting persistent storage of the duplicate range RDUP of the candidate block by configuring mapping metadata of the candidate block to reference the target range RTARG of the target block. - View Dependent Claims (21)
-
Specification