System and method for retrieving and using block fingerprints for data deduplication
First Claim
Patent Images
1. A method for identifying duplicate data in a storage system, the method comprising:
- storing data in a plurality of data blocks;
calculating a plurality of checksum values, each checksum value associated with the data of each data block;
storing each checksum value in a fingerprint of a plurality of fingerprints, each fingerprint associated with a data block of the plurality of data blocks; and
identifying duplicate data blocks of the plurality of data blocks by identifying identical fingerprints of the plurality of fingerprints.
3 Assignments
0 Petitions
Accused Products
Abstract
A system and method for calculating and storing block fingerprints for data deduplication. A fingerprint extraction layer generates a fingerprint of a predefined size, e.g., 64 bits, for each data block stored by a storage system. Each fingerprint is stored in a fingerprint record, and the fingerprint records are, in turn, stored in a fingerprint database for access by the data deduplication module. The data deduplication module may periodically compare the fingerprints to identify duplicate fingerprints, which, in turn, indicate duplicate data blocks.
-
Citations
27 Claims
-
1. A method for identifying duplicate data in a storage system, the method comprising:
-
storing data in a plurality of data blocks; calculating a plurality of checksum values, each checksum value associated with the data of each data block; storing each checksum value in a fingerprint of a plurality of fingerprints, each fingerprint associated with a data block of the plurality of data blocks; and identifying duplicate data blocks of the plurality of data blocks by identifying identical fingerprints of the plurality of fingerprints. - View Dependent Claims (2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17)
-
-
18. A storage operating system adapted to identify duplicate data, the storage operating system comprising:
-
a file system layer; a storage module layer cooperating with the file system layer to process a write operation, the storage module layer further adapted to store a data block and to calculate a checksum value associated with the data block; a fingerprint extraction layer adapted to store the checksum value to a fingerprint database, wherein the fingerprint database comprises a plurality of checksum values; and a deduplication module adapted to identify duplicate data blocks by identifying identical checksum values in the fingerprint database. - View Dependent Claims (19, 20, 21, 22, 23, 24, 25, 26)
-
-
27. A computer readable medium for identifying duplicate data in a storage system, the computer readable medium including program instructions for performing the steps of:
-
storing data in a plurality of data blocks; calculating a plurality of checksum values, each checksum value associated with the data of each data block; storing each checksum value in a fingerprint of a plurality of fingerprints, each fingerprint associated with a data block of the plurality of data blocks; and identifying duplicate data blocks of the plurality of data blocks by identifying identical fingerprints of the plurality of fingerprints.
-
Specification