System and method for retrieving and using block fingerprints for data deduplication
First Claim
Patent Images
1. A method for identifying duplicate data in a storage system, the method comprising:
- storing data in a plurality of data blocks;
calculating a plurality of checksum values, each checksum value associated with the data of each data block;
storing each checksum value in a fingerprint of a plurality of fingerprints, each fingerprint associated with a data block of the plurality of data blocks; and
identifying duplicate data blocks of the plurality of data blocks by identifying identical fingerprints of the plurality of fingerprints.
2 Assignments
0 Petitions
Accused Products
Abstract
A system and method for calculating and storing block fingerprints for data deduplication. A fingerprint extraction layer generates a fingerprint of a predefined size, e.g., 64 bits, for each data block stored by a storage system. Each fingerprint is stored in a fingerprint record, and the fingerprint records are, in turn, stored in a fingerprint database for access by the data deduplication module. The data deduplication module may periodically compare the fingerprints to identify duplicate fingerprints, which, in turn, indicate duplicate data blocks.
456 Citations
27 Claims
-
1. A method for identifying duplicate data in a storage system, the method comprising:
-
storing data in a plurality of data blocks; calculating a plurality of checksum values, each checksum value associated with the data of each data block; storing each checksum value in a fingerprint of a plurality of fingerprints, each fingerprint associated with a data block of the plurality of data blocks; and identifying duplicate data blocks of the plurality of data blocks by identifying identical fingerprints of the plurality of fingerprints. - View Dependent Claims (2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17)
-
-
18. A storage operating system adapted to identify duplicate data, the storage operating system comprising:
-
a file system layer; a storage module layer cooperating with the file system layer to process a write operation, the storage module layer further adapted to store a data block and to calculate a checksum value associated with the data block; a fingerprint extraction layer adapted to store the checksum value to a fingerprint database, wherein the fingerprint database comprises a plurality of checksum values; and a deduplication module adapted to identify duplicate data blocks by identifying identical checksum values in the fingerprint database. - View Dependent Claims (19, 20, 21, 22, 23, 24, 25, 26)
-
-
27. A computer readable medium for identifying duplicate data in a storage system, the computer readable medium including program instructions for performing the steps of:
-
storing data in a plurality of data blocks; calculating a plurality of checksum values, each checksum value associated with the data of each data block; storing each checksum value in a fingerprint of a plurality of fingerprints, each fingerprint associated with a data block of the plurality of data blocks; and identifying duplicate data blocks of the plurality of data blocks by identifying identical fingerprints of the plurality of fingerprints.
-
Specification