SYSTEM AND METHOD FOR DATA DEDUPLICATION FOR DISK STORAGE SUBSYSTEMS
First Claim
1. A method for data deduplication comprising:
- segmenting an original data set into a plurality of data segments;
transforming the data in each data segment into a transformed data representation that comprises a band-type structure for each data segment, wherein said band-type structure comprises a plurality of bands;
selecting a first set of bands, grouping them together and storing them with the original data set, wherein said first set of bands comprises non-identical transformed data for each data segment;
selecting a second set of bands and grouping them together, wherein said second set of bands comprises identical transformed data for each data segment;
applying a hash function onto the transformed data of the second set of bands and thereby generating transformed data segments indexed by hash function indices;
storing the hash function indices and the transformed data representation of one representative data segment in a deduplication database.
10 Assignments
0 Petitions
Accused Products
Abstract
A method for data deduplication includes the following steps. First, segmenting an original data set into a plurality of data segments. Next, transforming the data in each data segment into a transformed data representation that has a band-type structure for each data segment. The band-type structure includes a plurality of bands. Next, selecting a first set of bands, grouping them together and storing them with the original data set. The first set of bands includes non-identical transformed data for each data segment. Next, selecting a second set of bands and grouping them together. The second set of bands includes identical transformed data for each data segment. Next, applying a hash function onto the transformed data of the second set of bands and thereby generating transformed data segments indexed by hash function indices. Finally, storing the hash function indices and the transformed data representation of one representative data segment in a deduplication database.
127 Citations
20 Claims
-
1. A method for data deduplication comprising:
-
segmenting an original data set into a plurality of data segments; transforming the data in each data segment into a transformed data representation that comprises a band-type structure for each data segment, wherein said band-type structure comprises a plurality of bands; selecting a first set of bands, grouping them together and storing them with the original data set, wherein said first set of bands comprises non-identical transformed data for each data segment; selecting a second set of bands and grouping them together, wherein said second set of bands comprises identical transformed data for each data segment; applying a hash function onto the transformed data of the second set of bands and thereby generating transformed data segments indexed by hash function indices; storing the hash function indices and the transformed data representation of one representative data segment in a deduplication database. - View Dependent Claims (2, 3, 4, 5, 6, 7)
-
-
8. A method for data deduplication comprising:
-
segmenting an original data set into a plurality of data segments; transforming the data in each data segment into a transformed data representation; removing one or more data from the transformed data representations of each data segment, wherein said removed data comprise non-identical data, and thereby resulting with identical remaining transformed data representations for each data segment; applying a hash function onto the remaining identical transformed data representations and thereby generating transformed data representations indexed by hash function indices; for each of the transformed data segments with the identical transformed data representations, storing the hash function indices and the transformed data representation of one representative data segment in a deduplication database. - View Dependent Claims (9, 10, 11)
-
-
12. A system for data deduplication comprising:
a deduplication engine comprising means for segmenting an original data set into a plurality of data segments, means for transforming the data in each data segment into a transformed data representation, means for removing one or more data from the transformed data representations of each data segment, wherein said removed data comprise non-identical data and thereby resulting with identical remaining transformed data representations for each data segment, means for applying a hash function onto the remaining identical transformed data representations and thereby generating transformed data representations indexed by hash function indices, and means for storing the hash function indices and the transformed data representation of one representative data segment in a deduplication database, for each of the transformed data segments with the identical transformed data representations. - View Dependent Claims (13, 14)
-
15. A system for data deduplication comprising:
a deduplication engine comprising means for segmenting an original data set into a plurality of data segments, means for transforming the data in each data segment into a transformed data representation that comprises a band-type data structure, wherein said band-type structure comprises a plurality of bands, means for selecting a first set of bands, grouping them together and storing them with the original data set, wherein said first set of bands comprise non-identical transformed data for each data segment, means for selecting a second set of bands and grouping them together, wherein said second set of bands comprise identical transformed data for each data segment, means for applying a hash function onto the transformed data of the second set of bands and thereby generating transformed data segments indexed by hash function indices, and means for storing the hash function indices and the transformed data representation of one representative data segment in a deduplication database. - View Dependent Claims (16, 17, 18, 19, 20)
Specification