Conversion of forms of user data segment IDs in a deduplication system
First Claim
Patent Images
1. A method, performed by a processor device, for managing data in a data storage having data deduplication, comprising:
- for a back reference data structure incorporating reference information for at least one user data segment to a storage block, using, by the processor device, a plurality of hash functions to convert between a plurality of form types of user data segment identification (ID'"'"'s) representative of the at least one user data segment, the plurality of form types indicating to a data deduplication system a number of the at least one user data segments which reference the storage block to facilitate efficient reclamation or recovery of failed data in the data deduplication system;
wherein the plurality of hash functions are employed to convert one of the plurality of form types comprising an x-byte hash value to another one of the plurality of form types comprising a y-byte hash value depending on the number of the at least one user data segments which reference the storage block, the x-byte hash value and the y-byte hash value each having a respective number of bytes, wherein the x value is a positive integer and the y value is a positive integer value;
combining at least some of the plurality of hash functions into combined hash functions, wherein combining at least some of the plurality of hash functions and a modulo function into unified hash functions providing a combined result of those of the plurality of hash functions performing form type conversions; and
using a number of buckets in a hash table of a final form type of the back reference data structure; and
performs applying the modulo function to an additional hashed value, wherein the number of buckets is applied to obtain a serial number of a particular bucket for storing a particular user data segment ID; and
storing the particular user data segment ID.
1 Assignment
0 Petitions
Accused Products
Abstract
Various embodiments for managing data in a data storage having data deduplication. For a back reference data structure incorporating reference information for at least one user data segment to a storage block, using a plurality of hash functions to convert between a plurality of form types of user data segment identification (ID'"'"'s) representative of the at least one user data segment.
13 Citations
9 Claims
-
1. A method, performed by a processor device, for managing data in a data storage having data deduplication, comprising:
-
for a back reference data structure incorporating reference information for at least one user data segment to a storage block, using, by the processor device, a plurality of hash functions to convert between a plurality of form types of user data segment identification (ID'"'"'s) representative of the at least one user data segment, the plurality of form types indicating to a data deduplication system a number of the at least one user data segments which reference the storage block to facilitate efficient reclamation or recovery of failed data in the data deduplication system;
wherein the plurality of hash functions are employed to convert one of the plurality of form types comprising an x-byte hash value to another one of the plurality of form types comprising a y-byte hash value depending on the number of the at least one user data segments which reference the storage block, the x-byte hash value and the y-byte hash value each having a respective number of bytes, wherein the x value is a positive integer and the y value is a positive integer value;combining at least some of the plurality of hash functions into combined hash functions, wherein combining at least some of the plurality of hash functions and a modulo function into unified hash functions providing a combined result of those of the plurality of hash functions performing form type conversions; and using a number of buckets in a hash table of a final form type of the back reference data structure; and
performs applying the modulo function to an additional hashed value, wherein the number of buckets is applied to obtain a serial number of a particular bucket for storing a particular user data segment ID; and
storing the particular user data segment ID. - View Dependent Claims (2, 3)
-
-
4. A system for managing data in a data storage having data deduplication, comprising:
-
a processor device, operational in the data storage, wherein the processor device, for a back reference data structure incorporating reference information for at least one user data segment to a storage block, uses a plurality of hash functions to convert between a plurality of form types of user data segment identification (ID'"'"'s) representative of the at least one user data segment, the plurality of form types indicating to a data deduplication system a number of the at least one user data segments which reference the storage block to facilitate efficient reclamation or recovery of failed data in the data deduplication system;
wherein the plurality of hash functions are employed to convert one of the plurality of form types comprising an x-byte hash value to another one of the plurality of form types comprising a y-byte hash value depending on the number of the at least one user data segments which reference the storage block, the x-byte hash value and the y-byte hash value each having a respective number of bytes, wherein the x value is a positive integer and the y value is a positive integer value;combines at least some of the plurality of hash functions into combined hash functions, wherein the processor combines at least some of the plurality of hash functions and a modulo function into unified hash functions providing a combined result of those of the plurality of hash functions performing form type conversions, and wherein the processor, pursuant to using the plurality of hash operations, uses a number of buckets in a hash table of a final form type of the back reference data structure; and
performs applying the modulo function to an additional hashed value, wherein the number of buckets is applied to obtain a serial number of a particular bucket for storing a particular user data segment ID; and
storing the particular user data segment ID in a memory in electrical communication with the processor device. - View Dependent Claims (5, 6)
-
-
7. A computer program product for managing data in a data storage having data deduplication, the computer program product comprising a non-transitory computer-readable storage medium having computer-readable program code portions stored therein, the computer-readable program code portions comprising:
-
a first executable portion that, for a back reference data structure incorporating reference information for at least one user data segment to a storage block, uses, by a processor device, a plurality of hash functions to convert between a plurality of form types of user data segment identification (ID'"'"'s) representative of the at least one user data segment, the plurality of form types indicating to a data deduplication system a number of the at least one user data segments which reference the storage block to facilitate efficient reclamation or recovery of failed data in the data deduplication system;
wherein the plurality of hash functions are employed to convert one of the plurality of form types comprising an x-byte hash value to another one of the plurality of form types comprising a y-byte hash value depending on the number of the at least one user data segments which reference the storage block, the x-byte hash value and the y-byte hash value each having a respective number of bytes, wherein the x value is a positive integer and the y value is a positive integer value;a second executable portion that combines at least some of the plurality of hash functions into combined hash functions; a third executable portion that combines at least some of the plurality of hash functions and a modulo function into unified hash functions providing a combined result of those of the plurality of hash functions performing form type conversions; and the second executable portion that, pursuant to using the plurality of hash operations, uses a number of buckets in a hash table of a final form type of the back reference data structure; and
performs applying the modulo function to an additional hashed value, wherein the number of buckets is applied to obtain a serial number of a particular bucket for storing a particular user data segment ID, and store a particular user data segment ID. - View Dependent Claims (8, 9)
-
Specification