SEGMENT COMBINING FOR DEDUPLICATION
First Claim
1. A non-transitory computer-readable storage device comprising instructions that, when executed, cause one or more processors to:
- receive a sequence of hashes, wherein data to be deduplicated has been partitioned into a sequence of data chunks and each hash is a hash of a corresponding data chunk;
determine, using one or more first indexes and for a subset of the sequence, locations of previously stored copies of the subset'"'"'s corresponding data chunks;
group the sequence'"'"'s hashes and corresponding data chunks into segments based in part on the determined information;
choose, for each segment, a store to deduplicate that segment against based in part on the determined information about the data chunks that make up that segment;
combine two or more segments chosen to be deduplicated against the same store and deduplicate them as a whole using a second index.
2 Assignments
0 Petitions
Accused Products
Abstract
A non-transitory computer-readable storage device includes instructions that, when executed, cause one or more processors to receive a sequence of hashes. Next, the one or more processors are further caused to determine locations of previously stored copies of a subset of the data chunks corresponding to the hashes. The one or more processors are further caused to group hashes and corresponding data chunks into segments based in part on the determined information. The one or more processors are caused to choose, for each segment, a store to deduplicate that segment against. Finally, the one or more processors are further caused to combine two or more segments chosen to be deduplicated against the same store and deduplicate them as a whole using a second index.
10 Citations
15 Claims
-
1. A non-transitory computer-readable storage device comprising instructions that, when executed, cause one or more processors to:
-
receive a sequence of hashes, wherein data to be deduplicated has been partitioned into a sequence of data chunks and each hash is a hash of a corresponding data chunk; determine, using one or more first indexes and for a subset of the sequence, locations of previously stored copies of the subset'"'"'s corresponding data chunks; group the sequence'"'"'s hashes and corresponding data chunks into segments based in part on the determined information; choose, for each segment, a store to deduplicate that segment against based in part on the determined information about the data chunks that make up that segment; combine two or more segments chosen to be deduplicated against the same store and deduplicate them as a whole using a second index. - View Dependent Claims (2, 3, 4, 5, 6)
-
-
7. A method, comprising:
-
receiving, by a processor, a sequence of hashes, wherein data to be deduplicated has been partitioned into a sequence of data chunks and each hash is a hash of a corresponding data chunk; determining, using one or more first indexes and for a subset of the sequence, locations of previously stored copies of the subset'"'"'s corresponding data chunks; grouping the sequence'"'"'s hashes and corresponding data chunks into segments based in part on the determined information; choosing, for each segment, a store to deduplicate that segment against based in part on the determined information about the data chunks that make up that segment; combining two or more segments chosen to be deduplicated against the same store and deduplicating them as a whole using a second index. - View Dependent Claims (8, 9, 10, 11, 12)
-
-
13. A device comprising:
-
one or more processors; memory coupled to the one or more processors; the one or more processors to receive a sequence of hashes, wherein data to be deduplicated has been partitioned into a sequence of data chunks and each hash is a hash of a corresponding data chunk; determine, using one or more first indexes and for a subset of the sequence, locations of previously stored copies of the subset'"'"'s corresponding data chunks; group the sequence'"'"'s hashes and corresponding data chunks into segments based in part on the determined information; choose, for each segment, a store to deduplicate that segment against based in part on the determined information about the data chunks that make up that segment; combine two or more segments chosen to be deduplicated against the same store and deduplicating them as a whole using a second index. - View Dependent Claims (14, 15)
-
Specification