Fast deduplication data verification
First Claim
1. A networked information management system configured to verify synchronization of deduplication information, the networked information management system comprising:
- a storage manager comprising computer hardware configured to;
retrieve, from an electronically stored deduplication database, a deduplication chunk table, wherein the deduplication chunk table stores, for a first data chunk identified in a primary table, a first value and a second value, wherein the first value is based on first primary identifications of first data blocks in the first data chunk and the second value comprises a sum of squared first primary identifications;
retrieve, from a secondary storage subsystem, a first single instance file (SFile) associated with the first data chunk;
determine a first SFile value based on second primary identifications of second data blocks stored in the first SFile;
determine a second SFile value based on the second primary identifications of the second data blocks stored in the first SFile;
compare the first value with the first SFile value and the second value with the second SFile value; and
determine that the primary table and the first SFile are synched in response to a determination that the first value matches the first SFile value and the second value matches the second SFile value.
2 Assignments
0 Petitions
Accused Products
Abstract
An information management system provides a data deduplication system that uses a primary table, a deduplication chunk table, and a chunk integrity table to ensure that a referenced deduplicated data block is only verified once during the data verification of a backup or other replication operation. The data deduplication system may reduce the computational and storage overhead associated with traditional data verification processes. The primary table, the deduplication chunk table, and the chunk integrity table, all of which are stored in a deduplication database, can also ensure synchronization between the deduplication database and secondary storage devices.
82 Citations
20 Claims
-
1. A networked information management system configured to verify synchronization of deduplication information, the networked information management system comprising:
a storage manager comprising computer hardware configured to; retrieve, from an electronically stored deduplication database, a deduplication chunk table, wherein the deduplication chunk table stores, for a first data chunk identified in a primary table, a first value and a second value, wherein the first value is based on first primary identifications of first data blocks in the first data chunk and the second value comprises a sum of squared first primary identifications; retrieve, from a secondary storage subsystem, a first single instance file (SFile) associated with the first data chunk; determine a first SFile value based on second primary identifications of second data blocks stored in the first SFile; determine a second SFile value based on the second primary identifications of the second data blocks stored in the first SFile; compare the first value with the first SFile value and the second value with the second SFile value; and determine that the primary table and the first SFile are synched in response to a determination that the first value matches the first SFile value and the second value matches the second SFile value. - View Dependent Claims (2, 3, 4, 5, 6, 7, 8, 9, 10)
-
11. A computer-implemented method for verifying synchronization of deduplication information, the computer-implemented method comprising:
-
retrieving, from an electronically stored deduplication database, a deduplication chunk table, wherein the deduplication chunk table stores, for a first data chunk identified in a primary table, a first value and a second value, wherein the first value is based on first primary identifications of first data blocks in the first data chunk and the second value comprises a sum of squared first primary identifications; retrieving, from a secondary storage subsystem, a first single instance file (SFile) associated with the first data chunk; determining a first SFile value based on second primary identifications of second data blocks stored in the first SFile; determining a second SFile value based on the second primary identifications of the second data blocks stored in the first SFile; comparing the first value with the first SFile value and the second value with the second SFile value; and determining that the primary table and the first SFile are synched in response to a determination that the first value matches the first SFile value and the second value matches the second SFile value. - View Dependent Claims (12, 13, 14, 15, 16, 17, 18, 19, 20)
-
Specification