Data deduping in content centric networking manifests
First Claim
1. A computer-implemented method, comprising:
- selecting, by a computer system, a partitioning function that identifies a pattern that is expected to occur a predetermined number of times within a data block, wherein the data block corresponds to a file in a filesystem;
processing a plurality of segments of the data block, using the partitioning function, to identify a set of chunk boundaries, wherein the partitioning function takes as input a segment si consisting of m consecutive bytes, wherein segment sistarts at the ith byte of the data block;
generating a chunk for each portion of the data block between two consecutive chunk boundaries;
generating one or more Manifests, wherein each Manifest includes a Content Object Hash (COH) value for each partitioned chunk;
storing, by the computer system, each Manifest and the corresponding partitioned chunk in a storage repository, wherein two partitioned chunks with a common COH value are stored once in the storage repository; and
determining that the file in the filesystem has been modified, and in response, the computer-implemented method further comprising;
determining a portion of the file that has been modified;
determining a nameless Content Object affected by the modification to the file based on the COH value;
generating one or more new nameless Content Objects that include the modification to the file and are to replace the affected Content Object;
storing the one or more nameless Content Objects in the storage repository; and
updating, in one or more Manifests of a Manifest hierarchy, COH values corresponding to the modified portion of the file to replace the affected Content Object with the new nameless Content Objects, at the modified portion of the file, to achieve data deduping across multiple files.
3 Assignments
0 Petitions
Accused Products
Abstract
A storage system facilitates deduping repeating data segments when generating a Manifest hierarchy for a file. During operation, the system can select a partitioning function that identifies a pattern that is expected to occur a predetermined number of times within the file. The system can process a plurality of segments of the file, using the partitioning function, to identify a set of chunk boundaries. The system generates a chunk for each file portion between two consecutive chunk boundaries, and generates a Manifest that includes a Content Object Hash (COH) value for each partitioned chunk. The system can store the Manifest and the unique partitioned chunks in a storage repository, such that two partitioned chunks with a common COH value are stored once in the storage repository.
648 Citations
20 Claims
-
1. A computer-implemented method, comprising:
-
selecting, by a computer system, a partitioning function that identifies a pattern that is expected to occur a predetermined number of times within a data block, wherein the data block corresponds to a file in a filesystem; processing a plurality of segments of the data block, using the partitioning function, to identify a set of chunk boundaries, wherein the partitioning function takes as input a segment si consisting of m consecutive bytes, wherein segment sistarts at the ith byte of the data block; generating a chunk for each portion of the data block between two consecutive chunk boundaries; generating one or more Manifests, wherein each Manifest includes a Content Object Hash (COH) value for each partitioned chunk; storing, by the computer system, each Manifest and the corresponding partitioned chunk in a storage repository, wherein two partitioned chunks with a common COH value are stored once in the storage repository; and determining that the file in the filesystem has been modified, and in response, the computer-implemented method further comprising; determining a portion of the file that has been modified; determining a nameless Content Object affected by the modification to the file based on the COH value; generating one or more new nameless Content Objects that include the modification to the file and are to replace the affected Content Object; storing the one or more nameless Content Objects in the storage repository; and updating, in one or more Manifests of a Manifest hierarchy, COH values corresponding to the modified portion of the file to replace the affected Content Object with the new nameless Content Objects, at the modified portion of the file, to achieve data deduping across multiple files. - View Dependent Claims (2, 3, 4, 5, 6, 7)
-
-
8. A non-transitory computer-readable storage medium storing instructions that when executed by a computer cause the computer to perform a method, the method comprising:
-
selecting a partitioning function that identifies a pattern that is expected to occur a predetermined number of times within a data block, wherein the data block corresponds to a file in a filesystem; processing a plurality of segments of the data block, using the partitioning function, to identify a set of chunk boundaries, wherein the partitioning function takes as input a segment si consisting of m consecutive bytes, wherein segment si starts at the ith byte of the data block; generating a chunk for each portion of the data block between two consecutive chunk boundaries; generating one or more Manifests, wherein each Manifest includes a Content Object Hash (COH) value for each partitioned chunk; storing each Manifest and the corresponding partitioned chunk in a storage repository, wherein two partitioned chunks with a common COH value are stored once in the storage repository; and determining that the file in the filesystem has been modified, and in response, the method further comprising; determining a portion of the file that has been modified; determining a nameless Content Object affected by the modification to the file based on the COH value; generating one or more new nameless Content Objects that include the modification to the file and are to replace the affected Content Object; storing the one or more nameless Content Objects in the storage repository; and updating, in one or more Manifests of a Manifest hierarchy, COH values corresponding to the modified portion of the file to replace the affected Content Object with the new nameless Content Objects, at the modified portion of the file, to achieve data deduping across multiple files. - View Dependent Claims (9, 10, 11, 12, 13, 14)
-
-
15. A computer system, comprising:
-
a processor; a memory storing instructions that when executed by the processor cause the computer system to; select a partitioning function that identifies a pattern that is expected to occur a predetermined number of times within a data block, wherein the data block corresponds to a file in a filesystem; process a plurality of segments of the data block, using the partitioning function, to identify a set of chunk boundaries, wherein the partitioning function takes as input a segment si consisting of m consecutive bytes, wherein segment si starts at the ith byte of the data block; generate a chunk for each portion of the data block between two consecutive chunk boundaries; generate one or more Manifests, wherein each Manifest includes a Content Object Hash (COH) value for each partitioned chunk; store each Manifest and the corresponding partitioned chunk in a storage repository, wherein two partitioned chunks with a common COH value are stored once in the storage repository; and determining that the file in the filesystem has been modified, and in response, the instructions further cause the computer system to; determine a portion of the file that has been modified; determine a nameless Content Object affected by the modification to the file based on the COH value; generate one or more new nameless Content Objects that include the modification to the file and are to replace the affected Content Object; store the one or more nameless Content Objects in the storage repository; and update, in one or more Manifests of a Manifest hierarchy, COH values corresponding to the modified portion of the file to replace the affected Content Object with the new nameless Content Objects, at the modified portion of the file, to achieve data deduping across multiple files. - View Dependent Claims (16, 17, 18, 19, 20)
-
Specification