Fast deduplication data verification
First Claim
1. A networked information management system configured to verify deduplication information, the networked information management system comprising:
- a storage manager comprising computer hardware configured to;
retrieve, from an electronically stored deduplication database, a chunk integrity table, wherein the chunk integrity table identifies a first data chunk, wherein the first data chunk is associated with a plurality of single instance file (SFile) containers, and wherein the chunk integrity table stores, for each SFile container in the plurality of SFile containers, an indication of whether the respective SFile container is verified;
retrieve, from a secondary storage subsystem, the first data chunk,wherein the first data chunk comprises one or more data blocks;
for each data block in the first data chunk,determine whether the respective data block comprises a link to one SFile container in the plurality of SFile containers;
perform a verification of the respective data block in response to a determination that the respective data block does not comprise the link; and
determine a verification status of the respective data block based on the indications of whether the SFile containers are verified stored in the chunk integrity table in response to a determination that the respective data block comprises the link.
2 Assignments
0 Petitions
Accused Products
Abstract
An information management system provides a data deduplication system that uses a primary table, a deduplication chunk table, and a chunk integrity table to ensure that a referenced deduplicated data block is only verified once during the data verification of a backup or other replication operation. The data deduplication system may reduce the computational and storage overhead associated with traditional data verification processes. The primary table, the deduplication chunk table, and the chunk integrity table, all of which are stored in a deduplication database, can also ensure synchronization between the deduplication database and secondary storage devices.
77 Citations
20 Claims
-
1. A networked information management system configured to verify deduplication information, the networked information management system comprising:
-
a storage manager comprising computer hardware configured to; retrieve, from an electronically stored deduplication database, a chunk integrity table, wherein the chunk integrity table identifies a first data chunk, wherein the first data chunk is associated with a plurality of single instance file (SFile) containers, and wherein the chunk integrity table stores, for each SFile container in the plurality of SFile containers, an indication of whether the respective SFile container is verified; retrieve, from a secondary storage subsystem, the first data chunk, wherein the first data chunk comprises one or more data blocks; for each data block in the first data chunk, determine whether the respective data block comprises a link to one SFile container in the plurality of SFile containers; perform a verification of the respective data block in response to a determination that the respective data block does not comprise the link; and determine a verification status of the respective data block based on the indications of whether the SFile containers are verified stored in the chunk integrity table in response to a determination that the respective data block comprises the link. - View Dependent Claims (2, 3, 4, 5, 6, 7, 8, 9, 10, 11)
-
-
12. A computer-implemented method for verifying deduplication information, the computer-implemented method comprising:
-
retrieving, from an electronically stored deduplication database, a chunk integrity table, wherein the chunk integrity table identifies a first data chunk, wherein the first data chunk is associated with a plurality of single instance file (SFile) containers, and wherein the chunk integrity table stores, for each SFile container in the plurality of SFile containers, an indication of whether the respective SFile container is verified; retrieving, from a secondary storage subsystem, the first data chunk, wherein the first data chunk comprises one or more data blocks; for each data block in the first data chunk, determining whether the respective data block comprises a link to one SFile container in the plurality of SFile containers; performing a verification of the respective data block in response to a determination that the respective data block does not comprise the link; and determining a verification status of the respective data block based on the indications of whether the SFile containers are verified stored in the chunk integrity table in response to a determination that the respective data block comprises the link. - View Dependent Claims (13, 14, 15, 16, 17, 18, 19, 20)
-
Specification