Method of and system for deduplicating backed up data in a client-server environment
First Claim
1. A method of deduplicating backed-up data, which comprises:
- creating a backup table at each of a plurality of backup clients, wherein each said backup table comprises a list of files and respective file types to be backed up;
receiving at a backup server backup tables from said backup clients;
merging at said backup server said received backup tables to form a merged backup table;
sorting said merged backup table according to file type from a file type fielding a best deduplication ratio to a file type yielding a worst deduplication ratio to form a sorted backup table, wherein a deduplication ratio for an original file is calculated by dividing an amount of space that would be required to store said original file by an amount of space required to store a deduplicated version of said original file;
requesting the files listed in said sorted backup table, in order, from said backup clients;
deduplicating files received from said backup clients, in order, using deduplication parameters optimized according to file type, said deduplication parameters including a chunking technique, a hashing technique, and a hash collision resolution technique;
calculating an average deduplication ratio for each deduplicated file type by dividing a sum of the individual deduplication ratios achieved for files of said each deduplicated file type by a number of the files of said each deduplicated file type; and
updating the deduplication ratio for each deduplicated file type with said calculated average deduplication ratio.
1 Assignment
0 Petitions
Accused Products
Abstract
In a method of and a system for deduplicating backed-up data backup clients create respective backup tables comprising a list of files and respective file types to be backed up. A backup server receives backup tables from the backup clients. The backup server merges the received backup tables to form a merged backup table. The backup server sorts the merged backup table according to file type from a file type yielding a best deduplication ratio to a file type yielding a worst deduplication ratio, thereby forming a sorted backup table. The backup server requests the files listed in the sorted backup table, in order, from the backup clients. The backup server deduplicates files received from the backup clients, in order, using deduplication parameters optimized according to file type. The method calculates an updated deduplication ratio for each deduplicated file type. Examples of deduplication parameters include chunking techniques and hashing techniques.
178 Citations
1 Claim
-
1. A method of deduplicating backed-up data, which comprises:
-
creating a backup table at each of a plurality of backup clients, wherein each said backup table comprises a list of files and respective file types to be backed up; receiving at a backup server backup tables from said backup clients; merging at said backup server said received backup tables to form a merged backup table; sorting said merged backup table according to file type from a file type fielding a best deduplication ratio to a file type yielding a worst deduplication ratio to form a sorted backup table, wherein a deduplication ratio for an original file is calculated by dividing an amount of space that would be required to store said original file by an amount of space required to store a deduplicated version of said original file; requesting the files listed in said sorted backup table, in order, from said backup clients; deduplicating files received from said backup clients, in order, using deduplication parameters optimized according to file type, said deduplication parameters including a chunking technique, a hashing technique, and a hash collision resolution technique; calculating an average deduplication ratio for each deduplicated file type by dividing a sum of the individual deduplication ratios achieved for files of said each deduplicated file type by a number of the files of said each deduplicated file type; and updating the deduplication ratio for each deduplicated file type with said calculated average deduplication ratio.
-
Specification