SCALABLE SEGMENT-BASED DATA DE-DUPLICATION SYSTEM AND METHOD FOR INCREMENTAL BACKUPS
First Claim
1. A scalable segment-based data de-duplication system for incremental backups, comprising:
- a master device on a secondary-storage node side that receives at least a plurality of incremental changes, a plurality of fingerprints of a plurality of segments to be de-duplicated, mapping entities from logical block address to physical location;
wherein said master device further includes at least a distributer to distribute at least a de-duplication functionality to at least a slave device on a data node side, and performs data de-duplication on said plurality of segments via a way to cluster a plurality of fingerprints in a data locality unit called container for said plurality of incremental changes, varied sampling rates for said plurality of segments, and a per-segment summary structure to avoid unnecessary inputs or outputs involved in de-duplication.
1 Assignment
0 Petitions
Accused Products
Abstract
A system in accordance with exemplary embodiments may provide a scalable segment-based data de-duplication for incremental backups. In the system, a master device on a secondary-storage node side may receive at least incremental changes, fingerprints, mapping entities, and distribute de-duplication functionality to at least a slave device, and performs data de-duplication on said plurality of segments via a way to cluster a plurality of fingerprints in a data locality unit called container for the incremental changes, varied sampling rates of a plurality of segments by having a fixed sampling rate for stable segments and by assigning a lower sampling rate for a plurality of unstable target files of de-duplication, and a per-segment summary structure to avoid unnecessary I/Os involved in de-duplication.
85 Citations
21 Claims
-
1. A scalable segment-based data de-duplication system for incremental backups, comprising:
-
a master device on a secondary-storage node side that receives at least a plurality of incremental changes, a plurality of fingerprints of a plurality of segments to be de-duplicated, mapping entities from logical block address to physical location; wherein said master device further includes at least a distributer to distribute at least a de-duplication functionality to at least a slave device on a data node side, and performs data de-duplication on said plurality of segments via a way to cluster a plurality of fingerprints in a data locality unit called container for said plurality of incremental changes, varied sampling rates for said plurality of segments, and a per-segment summary structure to avoid unnecessary inputs or outputs involved in de-duplication. - View Dependent Claims (2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15)
-
-
16. A scalable segment-based data de-duplication method for incremental backups, executed by a master device on a secondary-storage node side, and comprising:
-
receiving at least a plurality of incremental changes, a plurality of fingerprints of a plurality of input segments to be de-duplicated, mapping entities from logical block address to physical location; clustering said plurality of fingerprints in a data locality unit called container for the incremental changes; assigning varied sampling rates for said plurality of segments; and constructing a per-segment summary structure to avoid unnecessary inputs or outputs involved in the data de-duplication. - View Dependent Claims (17, 18, 19, 20, 21)
-
Specification