De-duplication system and method thereof
First Claim
1. A de-duplication system comprising:
- a memory;
a first storage device;
a second storage device;
a first processor, wherein the processor;
determines a calculation range of content input from a client terminal based upon a predetermined maximum chunk size and a predetermined minimum chunk size,sets at least a first and second small calculation ranges, both the first and second small calculation ranges being smaller than the first calculation range,sets the positions of windows for rolling hash calculation with respect to the first and second small calculation ranges at integral multiples of a width of each of the windows so that successive windows overlap, andsubjects the at least first and second small calculation ranges to a rolling hash calculation with shifting of the windows set to the first and second small calculation ranges based on parallel processing to form a cut-out chunk from the content; and
a second processor communicatively coupled to the memory, the first storage device, the second storage device, and the first processor, wherein the second processor;
does not store the cut-out chunk into the first storage device when the chunk having the same contents as the cut-out chunk is already stored in the first storage device.
1 Assignment
0 Petitions
Accused Products
Abstract
Chunk de-duplication performance is improved. A de-duplication system has a cut-out processing unit which inputs a content from a client terminal thereinto, determines a calculation range from a predetermined maximum chunk size and a predetermined minimum chunk size, divides the calculation range into at least two small calculation ranges, sets the positions of windows for rolling hash calculation so that the rolling hash calculation is continuous between the two small calculation ranges, and subjects the at least two small calculation ranges to the rolling hash calculation with shifting of the windows based on parallel processing to cut out a chunk from the content, and a de-duplication processing unit which does not store the cut-out chunk into a storage device when the chunk having the same contents as those of the cut-out chunk is already stored in the storage device.
21 Citations
14 Claims
-
1. A de-duplication system comprising:
-
a memory; a first storage device; a second storage device; a first processor, wherein the processor; determines a calculation range of content input from a client terminal based upon a predetermined maximum chunk size and a predetermined minimum chunk size, sets at least a first and second small calculation ranges, both the first and second small calculation ranges being smaller than the first calculation range, sets the positions of windows for rolling hash calculation with respect to the first and second small calculation ranges at integral multiples of a width of each of the windows so that successive windows overlap, and subjects the at least first and second small calculation ranges to a rolling hash calculation with shifting of the windows set to the first and second small calculation ranges based on parallel processing to form a cut-out chunk from the content; and a second processor communicatively coupled to the memory, the first storage device, the second storage device, and the first processor, wherein the second processor; does not store the cut-out chunk into the first storage device when the chunk having the same contents as the cut-out chunk is already stored in the first storage device. - View Dependent Claims (2, 3, 4, 5, 6, 7)
-
-
8. A de-duplication method in a de-duplication system which de-duplicates a chunk stored into a storage device, the de-duplication method comprising the steps of:
-
inputting a content from a client terminal; determining a calculation range from a predetermined maximum chunk size and a predetermined minimum chunk size; dividing the calculation range into at least first and second small calculation ranges; setting the positions of windows for rolling hash calculation at integral multiples of a width of each of the windows so that successive windows overlap; subjecting the at least first and second small calculation ranges to a rolling hash calculation with shifting of the windows based on parallel processing to form a cut-out chunk from the content; and not storing the cut-out chunk into the storage device when the chunk having the same contents as those of the cut-out chunk is already stored into the storage device. - View Dependent Claims (9, 10, 11, 12, 13, 14)
-
Specification