File comparison for data backup and file synchronization
First Claim
1. A system for comparing a first data set on a first storage device with a second data set on a second storage device, comprising:
- a transmission medium for transmitting data between the first storage device and the second storage device;
a control program that generates a first digital signature from the first data set and a second digital signature from the second data set, the control program including a function F that incrementally calculates position sensitive first and second digital signatures, wherein said first data set is defined by a sliding indexed window and said incremental calculation comprises data leaving the window, data entering the window, and a previous digital signature; and
a comparator for determining whether the first digital signature matches the second digital signature.
14 Assignments
0 Petitions
Accused Products
Abstract
File comparison employs a single function F to calculate a digital signature from data in a sliding window. The digital signature is both incrementally computable and position sensitive. In particular, F is computable without reprocessing each byte in the array when the window is advanced and facilitates detection of such changes as transposed bytes of data. The function F is defined by two qualities. First, for F(A+B), where A is an array, F(A+B)=F(A)+F(B). Second, given a concatenation operator "!" such that "0!A" indicates an array A with 0 inserted before A, the function F has the property that there is a function G such that F(0!A)=G(F(A'"'"'0)). Both polynomials and cyclic redundancy checks ("CRC") may be used as that class of function.
239 Citations
17 Claims
-
1. A system for comparing a first data set on a first storage device with a second data set on a second storage device, comprising:
-
a transmission medium for transmitting data between the first storage device and the second storage device; a control program that generates a first digital signature from the first data set and a second digital signature from the second data set, the control program including a function F that incrementally calculates position sensitive first and second digital signatures, wherein said first data set is defined by a sliding indexed window and said incremental calculation comprises data leaving the window, data entering the window, and a previous digital signature; and a comparator for determining whether the first digital signature matches the second digital signature. - View Dependent Claims (2, 3, 4, 5)
-
-
6. A method for calculating a position sensitive digital signature from data on a first storage medium for comparison with a digital signature that represents data on a second storage medium, comprising the steps of:
-
selecting the data in the first system with an indexed sliding window; applying a function F to the data within the window to generate the incremental position sensitive digital signature, wherein said function of F comprises input from data leaving said sliding indexed window, data entering said sliding indexed window, and a previously generated digital signature; and comparing the position sensitive digital signature with at least one digital signature representing data in the second system. - View Dependent Claims (7, 8, 9, 10)
-
-
11. A method for providing an incremental backup of a first memory in a second memory wherein a set of different files have previously been stored in the second memory, comprising the steps of:
-
selecting a file from the first memory for examination; generating a signature from a portion of the file defined by a sliding indexed window; comparing the generated signature with signatures generated from the set of previously stored files having different filenames; determining the closest matching previously stored file, relative to the file under examination, where the stored file and the file under examination have different filenames, by identifying at least some portions of the stored file and the file under examination which are different, wherein said determining comprises applying a function F to the data within said sliding indexed window to generate an incremental position sensitive digital signature, wherein said function of F comprises input from data leaving said sliding indexed window, data entering said sliding indexed window, and a previously generated digital signature; and storing the portions of the file under examination identified as being different from the closest matching file in the second memory. - View Dependent Claims (12, 13)
-
-
14. A method for determining a minimum set of data compression units for restoring a file from a base copy and a plurality of revision elements, comprising the steps of:
-
selecting the base copy and revision elements required to build the file; sorting the selected base copy and revision elements into a list with the most recently generated revision element at the list head and the base copy at the list tail; reading information for each selected revision element into an array with five columns;
chunk, operation, data offset, data length, and target offset;creating an output array indicative of the reconstructed file with five columns;
revision element pointer, chunk, data offset, data length, and target offset;calling a recursive function for the most recent revision, requesting data offset 0 and the final file length, and passing in target offset 0, the recursive function iterating through the array columns and comparing the requested data offset and length with the target of offset and data length of each item, and in the case of a match, writing an entry into the output array if the array item is a data operation; optimizing, via a block filter, an optimal set of array elements wherein elements not affecting said output array are ignored; and sorting the output array by revision element and chunk; and transmitting the array followed by the transmission blocks for each data block. - View Dependent Claims (15, 16, 17)
-
Specification