SEGMENT COMBINING FOR DEDUPLICATION

US 20150066877A1
Filed: 05/01/2012
Published: 03/05/2015
Est. Priority Date: 05/01/2012
Status: Abandoned Application

First Claim

Patent Images

1. A non-transitory computer-readable storage device comprising instructions that, when executed, cause one or more processors to:

receive a sequence of hashes, wherein data to be deduplicated has been partitioned into a sequence of data chunks and each hash is a hash of a corresponding data chunk;

determine, using one or more first indexes and for a subset of the sequence, locations of previously stored copies of the subset'"'"'s corresponding data chunks;

group the sequence'"'"'s hashes and corresponding data chunks into segments based in part on the determined information;

choose, for each segment, a store to deduplicate that segment against based in part on the determined information about the data chunks that make up that segment;

combine two or more segments chosen to be deduplicated against the same store and deduplicate them as a whole using a second index.

View all claims

2 Assignments

Timeline View

Assignment View

0 Petitions

Accused Products

Abstract

A non-transitory computer-readable storage device includes instructions that, when executed, cause one or more processors to receive a sequence of hashes. Next, the one or more processors are further caused to determine locations of previously stored copies of a subset of the data chunks corresponding to the hashes. The one or more processors are further caused to group hashes and corresponding data chunks into segments based in part on the determined information. The one or more processors are caused to choose, for each segment, a store to deduplicate that segment against. Finally, the one or more processors are further caused to combine two or more segments chosen to be deduplicated against the same store and deduplicate them as a whole using a second index.

10 Citations

View as Search Results

15 Claims

1. A non-transitory computer-readable storage device comprising instructions that, when executed, cause one or more processors to:
- receive a sequence of hashes, wherein data to be deduplicated has been partitioned into a sequence of data chunks and each hash is a hash of a corresponding data chunk;
  
  determine, using one or more first indexes and for a subset of the sequence, locations of previously stored copies of the subset'"'"'s corresponding data chunks;
  
  group the sequence'"'"'s hashes and corresponding data chunks into segments based in part on the determined information;
  
  choose, for each segment, a store to deduplicate that segment against based in part on the determined information about the data chunks that make up that segment;
  
  combine two or more segments chosen to be deduplicated against the same store and deduplicate them as a whole using a second index.
- View Dependent Claims (2, 3, 4, 5, 6)
- - 2. The device of claim 1, wherein the one or more first indexes are Bloom filters or sets.
  - 3. The device of claim 1, wherein the second index is a sparse index.
  - 4. The device of claim 1, wherein choosing causes the one or more processors to choose for a given segment based in part on which stores the determined information indicates already have the most data chunks belonging to that segment.
  - 5. The device of claim 1, wherein combining causes the one or more processors to combine a predetermined number of segments.
  - 6. The device of claim 1, wherein combining causes the one or more processors to concatenate segments together until a minimum size is reached.

7. A method, comprising:
- receiving, by a processor, a sequence of hashes, wherein data to be deduplicated has been partitioned into a sequence of data chunks and each hash is a hash of a corresponding data chunk;
  
  determining, using one or more first indexes and for a subset of the sequence, locations of previously stored copies of the subset'"'"'s corresponding data chunks;
  
  grouping the sequence'"'"'s hashes and corresponding data chunks into segments based in part on the determined information;
  
  choosing, for each segment, a store to deduplicate that segment against based in part on the determined information about the data chunks that make up that segment;
  
  combining two or more segments chosen to be deduplicated against the same store and deduplicating them as a whole using a second index.
- View Dependent Claims (8, 9, 10, 11, 12)
- - 8. The method of claim 7, wherein the one or more first indexes are Bloom filters.
  - 9. The method of claim 7, wherein the second index is a sparse index.
  - 10. The method of claim 7, wherein choosing comprises choosing for a given segment based in part on which stores the determined information indicates already have the most data chunks belonging to that segment.
  - 11. The method of claim 7, wherein combining two or more segments comprises combining a predetermined number of segments.
  - 12. The method of claim 7, wherein combining two or more segments comprises concatenating segments together until a minimum size is reached.

13. A device comprising:
- one or more processors;
  
  memory coupled to the one or more processors;
  
  the one or more processors toreceive a sequence of hashes, wherein data to be deduplicated has been partitioned into a sequence of data chunks and each hash is a hash of a corresponding data chunk;
  
  determine, using one or more first indexes and for a subset of the sequence, locations of previously stored copies of the subset'"'"'s corresponding data chunks;
  
  group the sequence'"'"'s hashes and corresponding data chunks into segments based in part on the determined information;
  
  choose, for each segment, a store to deduplicate that segment against based in part on the determined information about the data chunks that make up that segment;
  
  combine two or more segments chosen to be deduplicated against the same store and deduplicating them as a whole using a second index.
- View Dependent Claims (14, 15)
- - 14. The device of claim 13, wherein choosing causes the one or more processors to choose for a given segment based in part on which stores the determined information indicates already have the most data chunks belonging to that segment.
  - 15. The device of claim 13, wherein combining causes the one or more processors to concatenate segments together until a minimum size is reached.

Specification

Resources

Litigation Campaign Assessment

Current Assignee
Hewlett Packard Enterprise Development LP (Hewlett-Packard Enterprise Company)
Original Assignee
Hewlett Packard Enterprise Development LP (Hewlett-Packard Enterprise Company)
Inventors
Lillibridge, Mark D., Bhagwat, Deepavali M.

Application Number

US14/395,492
Publication Number

US 20150066877A1
Time in Patent Office

Days
Field of Search
US Class Current

707/692
CPC Class Codes

G06F 16/1752   based on file chunks

G06F 3/0608   Saving storage space on sto...

G06F 3/0641   De-duplication techniques

G06F 3/067   Distributed or networked st...

SEGMENT COMBINING FOR DEDUPLICATION

First Claim

2 Assignments

0 Petitions

Accused Products

Abstract

10 Citations

15 Claims

Specification

Solutions

Use Cases

Quick Links

SEGMENT COMBINING FOR DEDUPLICATION

First Claim

2 Assignments

Subscription Required

Subscription Required

0 Petitions

Subscription Required

Accused Products

Subscription Required

Abstract

10 Citations

15 Claims

Specification

Subscription Required

Solutions

Use Cases

Quick Links