Enhanced method and system for assuring integrity of deduplicated data
First Claim
1. An enhanced method in a computer system for assuring integrity of deduplicated data in a data system, comprising:
- performing deduplication upon a data object by dividing the data object into a set of one or more data chunks, and for each data chunk;
inputting the data chunk into a hash function;
performing the hash function upon a predetermined size of the data chunk to produce an intermediate hash value of the data chunk;
performing the hash function on the remainder of the data chunk;
computing a hash value for the data chunk used for determining whether a duplicate of the data chunk exists in the data system; and
deduplicating the data chunk based on the computed hash value for each data chunk;
generating an original object signature of the data object by computing a checksum from a collection of the intermediate hash values produced for the predetermined size of each data chunk within the data object;
storing the original object signature in an index;
assembling the deduplicated data object into a reassembled state responsive to said data object being accessed;
dividing the reassembled data object into a set of one or more data chunks, and for each data chunk;
inputting the data chunk into the hash function; and
performing the hash function on the predetermined size of the data chunk to produce an intermediate hash value of the data chunk;
generating a reassembled object signature of the reassembled data object by computing a checksum from a collection of the intermediate hash values produced for the predetermined size of each data chunk within the reassembled data object;
comparing the reassembled object signature with the original object signature associated with the data object stored in the index; and
providing the reassembled data object if the reassembled object signature matches the original object signature.
4 Assignments
0 Petitions
Accused Products
Abstract
The present invention provides for an enhanced method and system for assuring integrity of deduplicated data objects stored within a storage system. A digital signature of the data object is generated to determine if the data object reassembled from a deduplicated state is identical to its pre-deduplication state. In one embodiment, generating the object signature of a data object before deduplication comprises generating an object signature from intermediate hash values computed from a hash function operating on each data chunk within the data object, the hash function also used to determine duplicate data chunks. In an alternative embodiment, generating the object signature of a data object before deduplication comprises generating an object signature on a portion of each data chunk of the data object.
236 Citations
19 Claims
-
1. An enhanced method in a computer system for assuring integrity of deduplicated data in a data system, comprising:
-
performing deduplication upon a data object by dividing the data object into a set of one or more data chunks, and for each data chunk; inputting the data chunk into a hash function; performing the hash function upon a predetermined size of the data chunk to produce an intermediate hash value of the data chunk; performing the hash function on the remainder of the data chunk; computing a hash value for the data chunk used for determining whether a duplicate of the data chunk exists in the data system; and deduplicating the data chunk based on the computed hash value for each data chunk; generating an original object signature of the data object by computing a checksum from a collection of the intermediate hash values produced for the predetermined size of each data chunk within the data object; storing the original object signature in an index; assembling the deduplicated data object into a reassembled state responsive to said data object being accessed; dividing the reassembled data object into a set of one or more data chunks, and for each data chunk; inputting the data chunk into the hash function; and performing the hash function on the predetermined size of the data chunk to produce an intermediate hash value of the data chunk; generating a reassembled object signature of the reassembled data object by computing a checksum from a collection of the intermediate hash values produced for the predetermined size of each data chunk within the reassembled data object; comparing the reassembled object signature with the original object signature associated with the data object stored in the index; and providing the reassembled data object if the reassembled object signature matches the original object signature. - View Dependent Claims (2, 3, 4, 5, 6, 7, 8)
-
-
9. An enhanced method in a computer system for assuring integrity of deduplicated data in a data system, comprising:
-
performing deduplication upon a data object by dividing the data object into a set of one or more data chunks and deduplicating the data chunks; generating an original object signature of the data object by computing a checksum from a portion of each data chunk within the data object; storing the original object signature in an index; assembling the deduplicated data object into a reassembled state responsive to said data object being accessed; generating a reassembled object signature of the reassembled data object by computing a checksum from a portion of each data chunk used to reassemble the reassembled data object; comparing the reassembled object signature with the original object signature stored in the index; and providing the reassembled data object if the reassembled object signature matches the original object signature. - View Dependent Claims (10, 11, 12, 13, 14, 15)
-
-
16. A system, comprising:
-
at least one processor; and at least one memory storing instructions operable with the at least one processor for assuring integrity of deduplicated data, the instructions being executed for; performing deduplication upon a data object by dividing the data object into a set of one or more data chunks, and for each data chunk; inputting the data chunk into a hash function; performing the hash function upon a predetermined size of the data chunk to produce an intermediate hash value of the data chunk; performing the hash function on the remainder of the data chunk; computing a hash value for the data chunk used for determining whether a duplicate of the data chunk exists in the data system; and deduplicating the data chunk based on the computed hash value for each data chunk; generating an original object signature of the data object by computing a checksum from a collection of the intermediate hash values produced for the predetermined size of each data chunk within the data object; storing the original object signature in an index; assembling the deduplicated data object into a reassembled state responsive to said data object being accessed; dividing the reassembled data object into a set of one or more data chunks, and for each data chunk; inputting the data chunk into the hash function; and performing the hash function on the predetermined size of the data chunk to produce an intermediate hash value of the data chunk; generating a reassembled object signature of the reassembled data object by computing a checksum from a collection of the intermediate hash values produced for the predetermined size of each data chunk within the reassembled data object; comparing the reassembled object signature with the original object signature associated with the data object stored in the index; and providing the reassembled data object if the reassembled object signature matches the original object signature.
-
-
17. A system, comprising:
-
at least one processor; and at least one memory storing instructions operable with the at least one processor for assuring integrity of deduplicated data, the instructions being executed for; performing deduplication upon a data object by dividing the data object into a set of one or more data chunks and deduplicating the data chunks; generating an original object signature of the data object by computing a checksum from a portion of each data chunk within the data object; storing the original object signature in an index; assembling the deduplicated data object into a reassembled state responsive to said data object being accessed; generating a reassembled object signature of the reassembled data object by computing a checksum from a portion of each data chunk used to reassemble the reassembled data object; comparing the reassembled object signature with the original object signature stored in the index; and providing the reassembled data object if the reassembled object signature matches the original object signature.
-
-
18. A computer program product comprising a computer useable medium having a computer readable program for assuring integrity of deduplicated data, wherein the computer readable program when executed on a computer causes the computer to:
-
perform deduplication upon a data object by dividing the data object into a set of one or more data chunks, and for each data chunk; inputting the data chunk into a hash function; performing the hash function upon a predetermined size of the data chunk to produce an intermediate hash value of the data chunk; performing the hash function on the remainder of the data chunk; computing a hash value for the data chunk used for determining whether a duplicate of the data chunk exists in the data system; and deduplicating the data chunk based on the computed hash value for each data chunk; generate an original object signature of the data object by computing a checksum from a collection of the intermediate hash values produced for the predetermined size of each data chunk within the data object; store the original object signature in an index; assemble the deduplicated data object into a reassembled state responsive to said data object being accessed; divide the reassembled data object into a set of one or more data chunks, and for each data chunk; inputting the data chunk into the hash function; and performing the hash function on the predetermined size of the data chunk to produce an intermediate hash value of the data chunk; generate a reassembled object signature of the reassembled data object by computing a checksum from a collection of the intermediate hash values produced for the predetermined size of each data chunk within the reassembled data object; compare the reassembled object signature with the original object signature associated with the data object stored in the index; and provide the reassembled data object if the reassembled object signature matches the original object signature.
-
-
19. A computer program product comprising a computer useable medium having a computer readable program for assuring integrity of deduplicated data, wherein the computer readable program when executed on a computer causes the computer to:
-
perform deduplication upon a data object by dividing the data object into a set of one or more data chunks and deduplicating the data chunks; generate an original object signature of the data object by computing a checksum from a portion of each data chunk within the data object; store the original object signature in an index; assemble the deduplicated data object into a reassembled state responsive to said data object being accessed; generate a reassembled object signature of the reassembled data object by computing a checksum from a portion of each data chunk used to reassemble the reassembled data object; compare the reassembled object signature with the original object signature stored in the index; and provide the reassembled data object if the reassembled object signature matches the original object signature.
-
Specification