Techniques for managing deduplication based on recently written extents
First Claim
1. In a data storage apparatus having processing circuitry and memory which stores extents, a method of managing deduplication of the extents, the method comprising:
- constructing, by the processing circuitry, a recently written extent list which identifies recently written extents stored within the memory;
referencing the recently written extent list to bypass extents identified by the recently written extent list when obtaining a candidate extent for possible deduplication; and
processing the candidate extent for possible deduplication;
wherein the data storage apparatus maintains an extent sharing index table having entries which (i) have existing hash values and (ii) identify extents; and
wherein processing the candidate extent for possible deduplication includes;
digesting the candidate extent to produce a current hash value,searching the extent sharing index table for an existing entry having an existing hash value which matches the current hash value,when an existing entry in the extent sharing index table is found to have an existing hash value which matches the current hash value,(i) searching the recently written extent list to confirm that an existing extent, which is identified by the existing entry, is not identified by the recently written extent list,(ii) when the existing extent is not identified by the recently written extent list, performing a comprehensive compare operation to determine whether to deduplicate the candidate extent with the existing extent, and(iii) when the existing extent is identified by the recently written extent list, adding a new entry to the extent sharing index table, the new entry having the current hash value and identifying the candidate extent, andwhen no existing entry in the extent sharing index table is found to have an existing hash value which matches the current hash value, adding a new entry to the extent sharing index table, the new entry having the current hash value and identifying the candidate extent.
9 Assignments
0 Petitions
Accused Products
Abstract
A technique is directed to managing deduplication of extents in a data storage apparatus having processing circuitry and memory which stores the extents (e.g., blocks). The technique involves constructing, by the processing circuitry, a recently written extent list which identifies recently written extents stored within the memory. The technique further involves referencing the recently written extent list to bypass (or skip over) extents identified by the recently written extent list when obtaining a candidate extent for possible deduplication. The technique further involves processing the candidate extent for possible deduplication. Here, by identifying frequently overwritten extents on the recently written extent list, the data storage apparatus is able to easily avoid cycles of deduplicating and subsequently splitting frequently overwritten extents.
91 Citations
21 Claims
-
1. In a data storage apparatus having processing circuitry and memory which stores extents, a method of managing deduplication of the extents, the method comprising:
-
constructing, by the processing circuitry, a recently written extent list which identifies recently written extents stored within the memory; referencing the recently written extent list to bypass extents identified by the recently written extent list when obtaining a candidate extent for possible deduplication; and processing the candidate extent for possible deduplication; wherein the data storage apparatus maintains an extent sharing index table having entries which (i) have existing hash values and (ii) identify extents; and wherein processing the candidate extent for possible deduplication includes; digesting the candidate extent to produce a current hash value, searching the extent sharing index table for an existing entry having an existing hash value which matches the current hash value, when an existing entry in the extent sharing index table is found to have an existing hash value which matches the current hash value, (i) searching the recently written extent list to confirm that an existing extent, which is identified by the existing entry, is not identified by the recently written extent list, (ii) when the existing extent is not identified by the recently written extent list, performing a comprehensive compare operation to determine whether to deduplicate the candidate extent with the existing extent, and (iii) when the existing extent is identified by the recently written extent list, adding a new entry to the extent sharing index table, the new entry having the current hash value and identifying the candidate extent, and when no existing entry in the extent sharing index table is found to have an existing hash value which matches the current hash value, adding a new entry to the extent sharing index table, the new entry having the current hash value and identifying the candidate extent. - View Dependent Claims (2, 3, 4, 5)
-
-
6. In a data storage apparatus having processing circuitry and memory which stores extents, a method of managing deduplication of the extents, the method comprising:
-
constructing, by the processing circuitry, a recently written extent list which identifies recently written extents stored within the memory; referencing the recently written extent list to bypass extents identified by the recently written extent list when obtaining a candidate extent for possible deduplication; and processing the candidate extent for possible deduplication; wherein constructing the recently written extent list which identifies the recently written extents includes; from a lock history database which stores a collection of lock information regarding write locks and other locks imposed on the extents stored in the memory, filtering the collection of lock information to ascertain write locked extents satisfying a predefined policy, and building the recently written extent list based only on the ascertained write locked extents satisfying the predefined policy. - View Dependent Claims (7, 8, 9)
-
-
10. A data storage apparatus, comprising:
-
a host interface; memory; and processing circuitry coupled to the host interface and the memory, the processing circuitry (i) processing host IO operations received through the host interface and (ii) managing deduplication of extents; wherein the processing circuitry, when managing deduplication of extents, is constructed and arranged to; construct, in the memory, a recently written extent list which identifies recently written extents stored within the memory, reference the recently written extent list to bypass extents identified by the recently written extent list when obtaining a candidate extent for possible deduplication, and process the candidate extent for possible deduplication; and wherein the processing circuitry maintains an extent sharing index table having entries which (i) have existing hash values and (ii) identify extents; and
wherein the processing circuitry, when processing the candidate extent for possible deduplication, is constructed and arranged to;digest the candidate extent to produce a current hash value, search the extent sharing index table for an existing entry having an existing hash value which matches the current hash value, when an existing entry in the extent sharing index table is found to have an existing hash value which matches the current hash value, (i) search the recently written extent list to confirm that an existing extent, which is identified by the existing entry, is not identified by the recently written extent list, (ii) when the existing extent is not identified by the recently written extent list, perform a comprehensive compare operation to determine whether to deduplicate the candidate extent with the existing extent, and (iii) when the existing extent is identified by the recently written extent list, add a new entry to the extent sharing index table, the new entry having the current hash value and identifying the candidate extent, and when no existing entry in the extent sharing index table is found to have an existing hash value which matches the current hash value, add a new entry to the extent sharing index table, the new entry having the current hash value and identifying the candidate extent. - View Dependent Claims (11, 12)
-
-
13. A data storage apparatus, comprising:
-
a host interface; memory; and processing circuitry coupled to the host interface and the memory, the processing circuitry (i) processing host IO operations received through the host interface and (ii) managing deduplication of extents; wherein the processing circuitry, when managing deduplication of extents, is constructed and arranged to; construct, in the memory, a recently written extent list which identifies recently written extents stored within the memory, reference the recently written extent list to bypass extents identified by the recently written extent list when obtaining a candidate extent for possible deduplication, and process the candidate extent for possible deduplication; and wherein the processing circuitry, when constructing the recently written extent list which identifies the recently written extents, is constructed and arranged to; from a lock history database which stores a collection of lock information regarding write locks and other locks imposed on the extents stored in the memory, filter the collection of lock information to ascertain write locked extents satisfying a predefined policy, and build the recently written extent list based only on the ascertained write locked extents satisfying the predefined policy. - View Dependent Claims (14, 15)
-
-
16. A computer program product having a non-transitory computer readable medium storing a set of instructions which, when carried out by a computerized device, directs the computerized device to manage deduplication of extents by performing a method comprising:
-
constructing, by the computerized device, a recently written extent list which identifies recently written extents stored within memory of the computerized device; referencing the recently written extent list to bypass extents identified by the recently written extent list when obtaining a candidate extent for possible deduplication; and processing the candidate extent for possible deduplication; wherein the computerized device maintains an extent sharing index table having entries which (i) have existing hash values and (ii) identify extents; and
wherein processing the candidate extent for possible deduplication includes;digesting the candidate extent to produce a current hash value, searching the extent sharing index table for an existing entry having an existing hash value which matches the current hash value, when an existing entry in the extent sharing index table is found to have an existing hash value which matches the current hash value, (i) searching the recently written extent list to confirm that an existing extent, which is identified by the existing entry, is not identified by the recently written extent list, (ii) when the existing extent is not identified by the recently written extent list, performing a comprehensive compare operation to determine whether to deduplicate the candidate extent with the existing extent, and (iii) when the existing extent is identified by the recently written extent list, adding a new entry to the extent sharing index table, the new entry having the current hash value and identifying the candidate extent, and when no existing entry in the extent sharing index table is found to have an existing hash value which matches the current hash value, adding a new entry to the extent sharing index table, the new entry having the current hash value and identifying the candidate extent. - View Dependent Claims (17, 18)
-
-
19. A computer program product having a non-transitory computer readable medium storing a set of instructions which, when carried out by a computerized device, directs the computerized device to manage deduplication of extents by performing a method comprising:
-
constructing, by the computerized device, a recently written extent list which identifies recently written extents stored within memory of the computerized device; referencing the recently written extent list to bypass extents identified by the recently written extent list when obtaining a candidate extent for possible deduplication; and processing the candidate extent for possible deduplication; wherein constructing the recently written extent list which identifies the recently written extents includes; from a lock history database which stores a collection of lock information regarding write locks and other locks imposed on the extents stored in the memory, filtering the collection of lock information to ascertain write locked extents satisfying a predefined policy, and building the recently written extent list based only on the ascertained write locked extents satisfying the predefined policy. - View Dependent Claims (20, 21)
-
Specification