Method and system for assuring integrity of deduplicated data
First Claim
1. A method in a computer system for assuring integrity of deduplicated data, comprising:
- copying a data object within a data system to a backup storage media;
generating an original object signature of the data object;
storing the original object signature of the data object in an index;
performing deduplication upon the data object, including dividing the data object into a set of one or more data chunks, and for each data chunk;
determining if a previously stored identical copy of the data chunk exists on a primary storage media;
storing the data chunk on the primary storage media in response to determining that a previously stored identical copy of the data chunk does not exist on the primary storage media; and
creating a pointer to the previously stored identical copy of the data chunk on the primary storage media in response to determining that a previously stored identical copy of data chunk exists on the primary storage media;
assembling the deduplicated data object into a reassembled state responsive to said data object being accessed by the computer system, wherein during a restore or a storage audit operation, the set of one more data chunks produced from the deduplicated data object are re-combined into a single data object;
generating a reassembled object signature for the reassembled data object;
comparing the reassembled object signature with the original object signature associated with the data object stored in the index;
providing the reassembled data object if the reassembled object signature matches the original object signature; and
providing the data object stored on the backup storage media if the reassembled object signature of the reassembled data object does not match the original object signature.
1 Assignment
0 Petitions
Accused Products
Abstract
The present invention provides for a system and method for assuring integrity of deduplicated data objects stored within a storage system. A data object is copied to secondary storage media, and a digital signature such as a checksum is generated of the data object. Then, deduplication is performed upon the data object and the data object is split into chunks. The chunks are combined when the data object is subsequently accessed, and a signature is generated for the reassembled data object. The reassembled data object is provided if the newly generated signature is identical to the originally generated signature, and otherwise a backup copy of the data object is provided from secondary storage media.
109 Citations
23 Claims
-
1. A method in a computer system for assuring integrity of deduplicated data, comprising:
-
copying a data object within a data system to a backup storage media; generating an original object signature of the data object; storing the original object signature of the data object in an index; performing deduplication upon the data object, including dividing the data object into a set of one or more data chunks, and for each data chunk; determining if a previously stored identical copy of the data chunk exists on a primary storage media; storing the data chunk on the primary storage media in response to determining that a previously stored identical copy of the data chunk does not exist on the primary storage media; and creating a pointer to the previously stored identical copy of the data chunk on the primary storage media in response to determining that a previously stored identical copy of data chunk exists on the primary storage media; assembling the deduplicated data object into a reassembled state responsive to said data object being accessed by the computer system, wherein during a restore or a storage audit operation, the set of one more data chunks produced from the deduplicated data object are re-combined into a single data object; generating a reassembled object signature for the reassembled data object; comparing the reassembled object signature with the original object signature associated with the data object stored in the index; providing the reassembled data object if the reassembled object signature matches the original object signature; and providing the data object stored on the backup storage media if the reassembled object signature of the reassembled data object does not match the original object signature. - View Dependent Claims (2, 3, 4, 5, 6, 7, 8, 9, 10, 11)
-
-
12. A computer program product comprising a computer useable medium having a computer readable program for assuring integrity of deduplicated data, wherein the computer readable program when executed on a computer causes the computer to:
-
copy a data object within a data system to a backup storage media; generate an original object signature of the data object; store the original object signature of the data object in an index; perform deduplication upon the data object, including dividing the data object into a set of one or more data chunks, and for each data chunk; determining if a previously stored identical copy of the data chunk exists on a primary storage media; storing the data chunk on the primary storage media in response to determining that a previously stored identical copy of the data chunk does not exist on the primary storage media; and creating a pointer to the previously stored identical copy of the data chunk on the primary storage media in response to determining that a previously stored identical copy of data chunk exists on the primary storage media; assemble the deduplicated data object into a reassembled state responsive to said data object being accessed by the computer system, wherein during a restore or a storage audit operation, the set of one more data chunks produced from the deduplicated data object are re-combined into a single data object; generate a reassembled object signature for the reassembled data object; compare the reassembled object signature with the original object signature associated with the data object stored in the index; provide the reassembled data object if the reassembled object signature matches the original object signature; and provide the data object stored on the backup storage media if the reassembled object signature of the reassembled data object does not match the original object signature. - View Dependent Claims (13, 14, 15, 16)
-
-
17. A system, comprising:
-
at least one processor; and at least one memory storing instructions operable with the at least one processor for assuring integrity of deduplicated data, the instructions being executed for; copying a data object within a data system to a backup storage media; generating an original object signature of the data object, including dividing the data object into a set of one or more data chunks, and for each data chunk; determining if a previously stored identical copy of the data chunk exists on a primary storage media; storing the data chunk on the primary storage media in response to determining that a previously stored identical copy of the data chunk does not exist on the primary storage media; and creating a pointer to the previously stored identical copy of the data chunk on the primary storage media in response to determining that a previously stored identical copy of data chunk exists on the primary storage media; storing the original object signature of the data object in an index; performing deduplication upon the data object; assembling the deduplicated data object into a reassembled state responsive to said data object being accessed by the computer system, wherein during a restore or a storage audit operation, the set of one more data chunks produced from the deduplicated data object are re-combined into a single data object; generating a reassembled object signature for the reassembled data object; comparing the reassembled object signature with the original object signature associated with the data object stored in the index; providing the reassembled data object if the reassembled object signature matches the original object signature; and providing the data object stored on the backup storage media if the reassembled object signature of the reassembled data object does not match the original object signature. - View Dependent Claims (18, 19, 20, 21)
-
-
22. A method in a computer system for assuring integrity of deduplicated data, comprising:
-
copying a data object within a data system to a backup storage media; generating an original object signature of the data object; storing the original object signature of the data object in an index; performing deduplication upon the data object, including dividing the data object into a set of one or more data chunks, and for each data chunk; determining if a previously stored identical copy of the data chunk exists on a primary storage media; storing the data chunk on the primary storage media in response to determining that a previously stored identical copy of the data chunk does not exist on the primary storage media; and creating a pointer to the previously stored identical copy of the data chunk on the primary storage media in response to determining that a previously stored identical copy of the data chunk exists on the primary storage media; assembling the deduplicated data object into a reassembled state responsive to said data object being accessed by the computer system, wherein during a restore or a storage audit operation, the set of one or more data chunks produced from the deduplicated data object are re-combined into a single data object; generating a reassembled object signature for the reassembled data object; comparing the reassembled object signature with the original object signature associated with the data object stored in the index; and providing the reassembled data object if the reassembled object signature matches the original object signature.
-
-
23. The method as in claimed 22, further comprising providing notification if the reassembled object signature dose not match the original object signature.
Specification