System and method for retrieving and using block fingerprints for data deduplication
First Claim
Patent Images
1. A method, comprising:
- storing data in a plurality of data blocks serviced by a storage system having a processor;
receiving a write operation directed to a data block of the plurality of data blocks;
performing a first computation to generate a checksum value for the data block to verify data integrity of the write operation;
generating, without requiring a second computation, a fingerprint of the data block to identify duplicate data of the plurality of data blocks by storing at least a portion of the checksum value generated to verify data integrity in the fingerprint and storing at least a portion of data from the data block in the fingerprint;
storing, in a fingerprint record, a copy of extracted metadata associated with the data block, wherein the metadata includes a generation number of an index node (inode) associated with the data block;
storing the fingerprint in the fingerprint record;
deleting the inode associated with the data block;
reallocating the inode in response to a new write operation;
modifying the generation number of the inode associated with the data block in response to the new write operation; and
eliminating the fingerprint record in response to the generation number of the fingerprint record differing from the modified generation number.
2 Assignments
0 Petitions
Accused Products
Abstract
A system and method for calculating and storing block fingerprints for data deduplication. A fingerprint extraction layer generates a fingerprint of a predefined size, e.g., 64 bits, for each data block stored by a storage system. Each fingerprint is stored in a fingerprint record, and the fingerprint records are, in turn, stored in a fingerprint database for access by the data deduplication module. The data deduplication module may periodically compare the fingerprints to identify duplicate fingerprints, which, in turn, indicate duplicate data blocks.
100 Citations
22 Claims
-
1. A method, comprising:
-
storing data in a plurality of data blocks serviced by a storage system having a processor; receiving a write operation directed to a data block of the plurality of data blocks; performing a first computation to generate a checksum value for the data block to verify data integrity of the write operation; generating, without requiring a second computation, a fingerprint of the data block to identify duplicate data of the plurality of data blocks by storing at least a portion of the checksum value generated to verify data integrity in the fingerprint and storing at least a portion of data from the data block in the fingerprint; storing, in a fingerprint record, a copy of extracted metadata associated with the data block, wherein the metadata includes a generation number of an index node (inode) associated with the data block; storing the fingerprint in the fingerprint record; deleting the inode associated with the data block; reallocating the inode in response to a new write operation; modifying the generation number of the inode associated with the data block in response to the new write operation; and eliminating the fingerprint record in response to the generation number of the fingerprint record differing from the modified generation number. - View Dependent Claims (2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14)
-
-
15. A computer configured to identify duplicate data, comprising:
-
a processor operatively connected to the computer configured to execute a storage operating system to issue a write operation directed to data of a storage device operatively connected to the computer; a first module of the storage operating system configured to process the write operation; the first module configured to perform a first computation to generate a checksum value for the data to verify data integrity of the write operation; the first module further configured to complete the write operation; a second module of the storage operating system configured to generate a fingerprint of the data to identify duplicate data, the fingerprint comprising at least a portion of the checksum value generated to verify data integrity and at least a portion of the data; wherein the storage operating system is further configured to extract metadata associated with the data, wherein the second module is further configured to store the checksum value and a copy of the metadata as to a fingerprint record; and wherein the second module is further configured to store an index node (inode) number and a file block number (fbn) in the fingerprint record, wherein the inode number identifies an inode associated with the data and the fbn indicates an offset within a data container, and wherein the metadata comprises; a generation number configured to indicate reallocation of the data, and a consistency point (CP) count configured to indicate modification of the data. - View Dependent Claims (16, 17, 18, 19, 20, 21)
-
-
22. A non-transitory computer readable storage medium containing program instructions for execution by a processor, comprising:
-
program instructions that store data in a plurality of data blocks; program instructions that receive a write operation directed to a data block of the plurality of data blocks; program instructions that perform a first computation to generate a checksum value for the data block to verify data integrity of the write operation; program instructions that generate a fingerprint of the data block by storing at least a portion of the checksum value generated to verity data integrity in the fingerprint; program instructions that store a copy of extracted metadata associated with the data block in a fingerprint record, wherein the metadata includes a generation number of an index node (inode) associated with the data block; program instructions that store the fingerprint in the fingerprint record; program instructions that delete the inode associated with the data block; program instructions that reallocate the inode in response to a new write operation; program instructions that modify the generation number of the inode associated with the data block in response to the new write operation; and program instructions that eliminate the fingerprint record in response to the generation number in the fingerprint record differing from the modified generation number.
-
Specification