Fast deduplication data verification
First Claim
1. A networked information management system configured to verify integrity of deduplication data, the networked information management system comprising:
- a storage manager comprising computer hardware configured to;
retrieve, from an electronically stored deduplication database, a deduplication chunk table, wherein the deduplication chunk table identifies a first data chunk;
retrieve, from a secondary storage subsystem, a first single instance file (SFile) associated with the first data chunk, wherein the first SFile comprises a plurality of SFile containers that each store one or more data blocks;
for each SFile container in the plurality of SFile containers,perform a data integrity verification of corresponding one or more data blocks of the respective SFile container,wherein the data integrity verification includes determining whether the corresponding one or more data blocks of the respective SFile container is readable, andstore a value representing a result of the data integrity verification of the corresponding one or more data blocks of the respective SFile container in association with an entry corresponding to the first data chunk in a chunk integrity table;
receive request to verify integrity of at least one data blocks referenced in the plurality of SFile containers; and
for each data block of the at least one data blocks,determine that a first link in the first data chunk references the data block of a corresponding SFile container in the plurality of SFile containers, andverify integrity of the data block of the corresponding SFile container by performing a lookup of the chunk integrity table to determine whether the block is verified instead of by analyzing the data block of the corresponding SFile container.
2 Assignments
0 Petitions
Accused Products
Abstract
An information management system provides a data deduplication system that uses a primary table, a deduplication chunk table, and a chunk integrity table to ensure that a referenced deduplicated data block is only verified once during the data verification of a backup or other replication operation. The data deduplication system may reduce the computational and storage overhead associated with traditional data verification processes. The primary table, the deduplication chunk table, and the chunk integrity table, all of which are stored in a deduplication database, can also ensure synchronization between the deduplication database and secondary storage devices.
-
Citations
20 Claims
-
1. A networked information management system configured to verify integrity of deduplication data, the networked information management system comprising:
-
a storage manager comprising computer hardware configured to; retrieve, from an electronically stored deduplication database, a deduplication chunk table, wherein the deduplication chunk table identifies a first data chunk; retrieve, from a secondary storage subsystem, a first single instance file (SFile) associated with the first data chunk, wherein the first SFile comprises a plurality of SFile containers that each store one or more data blocks; for each SFile container in the plurality of SFile containers, perform a data integrity verification of corresponding one or more data blocks of the respective SFile container, wherein the data integrity verification includes determining whether the corresponding one or more data blocks of the respective SFile container is readable, and store a value representing a result of the data integrity verification of the corresponding one or more data blocks of the respective SFile container in association with an entry corresponding to the first data chunk in a chunk integrity table; receive request to verify integrity of at least one data blocks referenced in the plurality of SFile containers; and for each data block of the at least one data blocks, determine that a first link in the first data chunk references the data block of a corresponding SFile container in the plurality of SFile containers, and verify integrity of the data block of the corresponding SFile container by performing a lookup of the chunk integrity table to determine whether the block is verified instead of by analyzing the data block of the corresponding SFile container. - View Dependent Claims (2, 3, 4, 5, 6, 7, 8, 9, 10, 11)
-
-
12. A computer-implemented method for verifying integrity of deduplication data, the computer-implemented method comprising:
-
retrieving, from an electronically stored deduplication database, a deduplication chunk table, wherein the deduplication chunk table identifies a first data chunk; retrieving, from a secondary storage subsystem, a first single instance file (SFile) associated with the first data chunk, wherein the first SFile comprises a plurality of SFile containers that each store one or more data blocks; for each SFile container in the plurality of SFile containers, performing a data integrity verification of corresponding one or more data blocks of the respective SFile container, wherein the data integrity verification includes determining whether the corresponding one or more data blocks of the respective SFile container is readable, and storing a value representing a result of the data integrity verification of the corresponding one or more data blocks of the respective SFile container in association with an entry corresponding to the first data chunk in a chunk integrity table; identifying a request to verify integrity of at least one data blocks referenced in one or more SFile containers in the plurality of the SFile containers; and for each data block of the at least one data blocks, determining that a first link in the first data chunk references a data block of a corresponding SFile container in the plurality of SFile containers, and verifying integrity of the data block of the corresponding SFile container by performing a lookup of the chunk integrity table to determine whether the data block is verified. - View Dependent Claims (13, 14, 15, 16, 17, 18, 19, 20)
-
Specification