Fast deduplication data verification
First Claim
1. A networked information management system configured to verify synchronization of deduplication information, the networked information management system comprising:
- a data storage computer comprising computer hardware configured to;
retrieve, from an electronically stored deduplication database, a primary table, wherein the primary table identifies data blocks stored in a secondary storage device and data chunks associated with the data blocks, and wherein the primary table comprises a primary identification for each identified data block;
generate, for a first data chunk of the data chunks identified in the primary table, a first value based on the primary identifications of the identified data blocks;
generate, for the first data chunk identified in the primary table, a second value by squaring the primary identifications of the identified data blocks that are associated with the first data chunk and summing the squared primary identifications;
store, for the first data chunk identified in the primary table, an identification of the first data chunk in a deduplication chunk table;
store, for the first data chunk identified in the primary table, the first value associated with the first data chunk in the deduplication chunk table;
store, for the first data chunk identified in the primary table, the second value associated with the first data chunk in the deduplication chunk table; and
compare, for the first data chunk identified in the deduplication chunk table, the stored first value and the stored second value with values derived from an instance file corresponding to the first data chunk to verify that information stored in the primary table and information stored in the secondary storage device is synchronized.
2 Assignments
0 Petitions
Accused Products
Abstract
An information management system provides a data deduplication system that uses a primary table, a deduplication chunk table, and a chunk integrity table to ensure that a referenced deduplicated data block is only verified once during the data verification of a backup or other replication operation. The data deduplication system may reduce the computational and storage overhead associated with traditional data verification processes. The primary table, the deduplication chunk table, and the chunk integrity table, all of which are stored in a deduplication database, can also ensure synchronization between the deduplication database and secondary storage devices.
60 Citations
17 Claims
-
1. A networked information management system configured to verify synchronization of deduplication information, the networked information management system comprising:
a data storage computer comprising computer hardware configured to; retrieve, from an electronically stored deduplication database, a primary table, wherein the primary table identifies data blocks stored in a secondary storage device and data chunks associated with the data blocks, and wherein the primary table comprises a primary identification for each identified data block; generate, for a first data chunk of the data chunks identified in the primary table, a first value based on the primary identifications of the identified data blocks; generate, for the first data chunk identified in the primary table, a second value by squaring the primary identifications of the identified data blocks that are associated with the first data chunk and summing the squared primary identifications; store, for the first data chunk identified in the primary table, an identification of the first data chunk in a deduplication chunk table; store, for the first data chunk identified in the primary table, the first value associated with the first data chunk in the deduplication chunk table; store, for the first data chunk identified in the primary table, the second value associated with the first data chunk in the deduplication chunk table; and compare, for the first data chunk identified in the deduplication chunk table, the stored first value and the stored second value with values derived from an instance file corresponding to the first data chunk to verify that information stored in the primary table and information stored in the secondary storage device is synchronized. - View Dependent Claims (2, 3, 4, 5, 6)
-
7. A computer-implemented method for verifying synchronization of deduplication information, the computer-implemented method comprising:
-
retrieving, from an electronically stored deduplication database, a primary table, wherein the primary table identifies data blocks stored in a secondary storage device and data chunks associated with the data blocks, and wherein the primary table comprises a primary identification for each identified data block; generating, for a first data chunk of the data chunks identified in the primary table, a first value based on the primary identifications of the identified data blocks; generating, for the first data chunk identified in the primary table, a second value by squaring the primary identifications of the identified data blocks that are associated with the first data chunk and summing the squared primary identifications; storing, for the first data chunk identified in the primary table, an identification of the first data chunk in a deduplication chunk table; storing, for the first data chunk identified in the primary table, the first value associated with the first data chunk in the deduplication chunk table; storing, for the first data chunk identified in the primary table, the second value associated with the first data chunk in the deduplication chunk table; and comparing, for the first data chunk identified in the deduplication chunk table, the stored first value and the stored second value with values derived from an instance file corresponding to the first data chunk to verify that information stored in the primary table and information stored in the secondary storage device is synchronized. - View Dependent Claims (8, 9, 10, 11, 12)
-
-
13. A networked information management system configured to verify synchronization of deduplication information, the networked information management system comprising:
-
a storage manager comprising computer hardware configured to receive a request to verify data in a backup; a deduplication database media agent comprising an electronically stored deduplication database and computer hardware configured to; retrieve, from the deduplication database, a primary table, wherein the primary table identifies data blocks stored in a secondary storage device and data chunks associated with the data blocks, and wherein the primary table comprises a primary identification for each identified data block; generate, for a first data chunk of the data chunks identified in the primary table, a first value based on the primary identifications of the identified data blocks; generate, for the first data chunk identified in the primary table, a second value by squaring the primary identifications of the identified data blocks that are associated with the first data chunk and summing the squared primary identifications; store, for the first data chunk identified in the primary table, an identification of the first data chunk in a deduplication chunk table; store, for the first data chunk identified in the primary table, the first value associated with the first data chunk in the deduplication chunk table; store, for the first data chunk identified in the primary table, the second value associated with the first data chunk in the deduplication chunk table; and compare, for the first data chunk identified in the deduplication chunk table, the stored first value and the stored second value with values derived from an instance file corresponding to the first data chunk to verify that information stored in the primary table and information stored in the secondary storage device is synchronized. - View Dependent Claims (14, 15, 16, 17)
-
Specification