OPTIMIZING DATA BLOCK SIZE FOR DEDUPLICATION
First Claim
Patent Images
1. A method for determining an optimal data block size for deduplicating a file type, the method comprising:
- a) constructing a function relating a plurality of compression ratios to a plurality of test data block sizes, wherein a compression ratio of the plurality of compression ratios is calculated by transforming a file of the file type using a deduplication technology and a test data block size of the plurality of test data block sizes;
b) determining a maximum compression ratio of the function; and
c) choosing a test data block size associated with the maximum compression ratio to be the optimal data block size for the file type.
9 Assignments
0 Petitions
Accused Products
Abstract
Provided herein is technology relating to data deduplication and particularly, but not exclusively, to methods and systems for determining an efficiently optimal size of data blocks to use for backing up a data source. Also provided herein are systems for identifying duplicate data in data backup applications.
127 Citations
20 Claims
-
1. A method for determining an optimal data block size for deduplicating a file type, the method comprising:
-
a) constructing a function relating a plurality of compression ratios to a plurality of test data block sizes, wherein a compression ratio of the plurality of compression ratios is calculated by transforming a file of the file type using a deduplication technology and a test data block size of the plurality of test data block sizes; b) determining a maximum compression ratio of the function; and c) choosing a test data block size associated with the maximum compression ratio to be the optimal data block size for the file type. - View Dependent Claims (2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 13)
-
-
12. A data backup system comprising:
-
i) a table relating a plurality of file types to a plurality of optimal data block sizes; ii) a deduplication technology; iii) a functionality to receive a data source having a file type; and iii) a processor configured to generate a plurality of data blocks from the data source, wherein each data block of the plurality of data blocks has a size that is the optimal data block size associated with the file type. - View Dependent Claims (14, 15, 16, 17, 18, 19, 20)
-
Specification