Apparatus and method for reconstructing a file from a difference signature and an original file
First Claim
1. A method for producing a difference signature of differences between an original file and an updated version of the original file, comprising(1) creating a token table from an original file in a first storage device by producing a token set for each equal sized contiguous segment of said original file, each token set comprising a primary exclusive-or based token and at least one order sensitive secondary token or cyclic redundancy check product term;
- and(2) generating a difference signature, using the token table and an updated file, by;
(a) defining a window of consideration for the updated file, said window being of a size equivalent to the segment size used to create the token set for the original file and comprising successive characters in the updated file;
(b) calculating a primary exclusive-or based token for the window of consideration;
(c) searching the token table for a primary token which matches the primary token for the window and advancing to step 2(g) if said matching primary token is not found in the token table;
(d)(i) generating a secondary token for said window in response to finding in the token table a primary token which matches the primary token from the window and comparing the secondary token to the secondary token in the corresponding token set in the token table; and
(ii) advancing to step 2(e) if the secondary tokens match;
(e) logging the offset of the current window to the difference signature to correlate the relative locations of the matching segment in the original and updated files, in response to finding a match between the secondary token from the window and the corresponding secondary token from the token set;
(f) advancing the window of consideration by the segment size to the next segment after the matched text, if there are any remaining segments in the updated file, and resuming the method at step (2b) above;
(g) advancing the window of consideration by at least one character to create the next window, which includes the characters of the previous window and at least one character in the updated file following the previous window minus the equivalent number of characters at the beginning of the previous window, in response to a failed token search for the previous window, which occurs where either said primary token for the previous window of said updated file is not found in the token table for said original file or where at least one matching primary token is found in the token table but no matching secondary token corresponding to said at least one matching primary token is found in the token table;
(h) generating a primary token for said next window of consideration by adjusting the primary token from the previous window and(j) repeating the cycle of steps (2b) through (2i) until the updated file is exhausted.
6 Assignments
0 Petitions
Accused Products
Abstract
Invention maintains duplicate files in safe places. A SCAN computer program creates a TOKEN Table of an earlier file. The TOKEN Table reflects the indices of successive segments of the file and the exclusive-or (XR) and Cyclic redundancy check (CRC) products of the characters in each segment. An updated file is compared to the earlier file by comparing the XR and CRC products of segments in the updated file to the XR and CRC products in the TOKEN Table. On detecting matching products for identical segments, the next segments are compared. On mismatch, the segment (window) for the updated file is bumped one character and new XR and CRC products generated and compared. The indices of the TOKEN Table and the offsets from the start of the file of the first characters of the updated file matching segments are set forth in a Match Table. Next the updated file is scrolled through for the non-matching information determined by acting on the indices and offsets of the Match Table to form the TRANSITION Table which is the Match Table and the updated file non-matching information. The TRANSITION Table contains the delta information which may be sent to another location having a copy of the earlier file thereat: the whole updated file need not be sent there. A reconstruction program at the location looks at the TRANSITION Table to determine where to get the characters for the copy of the updated file it is creating.
-
Citations
28 Claims
-
1. A method for producing a difference signature of differences between an original file and an updated version of the original file, comprising
(1) creating a token table from an original file in a first storage device by producing a token set for each equal sized contiguous segment of said original file, each token set comprising a primary exclusive-or based token and at least one order sensitive secondary token or cyclic redundancy check product term; - and
(2) generating a difference signature, using the token table and an updated file, by; (a) defining a window of consideration for the updated file, said window being of a size equivalent to the segment size used to create the token set for the original file and comprising successive characters in the updated file; (b) calculating a primary exclusive-or based token for the window of consideration; (c) searching the token table for a primary token which matches the primary token for the window and advancing to step 2(g) if said matching primary token is not found in the token table; (d) (i) generating a secondary token for said window in response to finding in the token table a primary token which matches the primary token from the window and comparing the secondary token to the secondary token in the corresponding token set in the token table; and (ii) advancing to step 2(e) if the secondary tokens match; (e) logging the offset of the current window to the difference signature to correlate the relative locations of the matching segment in the original and updated files, in response to finding a match between the secondary token from the window and the corresponding secondary token from the token set; (f) advancing the window of consideration by the segment size to the next segment after the matched text, if there are any remaining segments in the updated file, and resuming the method at step (2b) above; (g) advancing the window of consideration by at least one character to create the next window, which includes the characters of the previous window and at least one character in the updated file following the previous window minus the equivalent number of characters at the beginning of the previous window, in response to a failed token search for the previous window, which occurs where either said primary token for the previous window of said updated file is not found in the token table for said original file or where at least one matching primary token is found in the token table but no matching secondary token corresponding to said at least one matching primary token is found in the token table; (h) generating a primary token for said next window of consideration by adjusting the primary token from the previous window and (j) repeating the cycle of steps (2b) through (2i) until the updated file is exhausted. - View Dependent Claims (3, 4, 5, 6, 7, 8, 12, 13, 14)
- and
-
2. A method for producing a difference signature of differences between an original file and an updated version of the original file when, in updating the original file, the majority of insertions and deletions of characters in segments of the original file are known to change the offsets of only those segments where said insertions and deletions have been made but do not change the offsets of adjacent segments, comprising;
-
(1) creating a token table from an original file in a first storage device by producing a token set for each equal sized contiguous segment of said original file, each token set comprising a primary exclusive-or based token and at least one order sensitive secondary token or cyclic redundancy check product term; and (2) generating a difference signature, using the token table and an updated file, by; (a) defining a window of consideration for the updated file, said window being of a size equivalent to the segment size used to create the token set for the original file and comprising successive characters in the updated file; (b) calculating a primary token for the window of consideration; (c) searching the token table for a primary token which matches the primary token for the window and advancing to step 2(g) if said matching primary token is not found in the token table; (d) (i) generating a secondary token for said window in response to finding in the token table a primary token which matches the primary token for the window and comparing the secondary token to the secondary token in the corresponding token set in the token table; and (ii) advancing to step 2(e) if the secondary tokens match; (e) logging the offset of the current window to the difference signature to correlate the relative locations of the matching segment in the original and updated files, in response to finding a match between the secondary token from the window and the corresponding secondary token from the token set; (f) advancing the window of consideration by the segment size to the next segment after the matched text, if there are any remaining segments in the updated file, and resuming the method at step (2b) above; (g) advancing the window of consideration by the segment size to the next segment to create the next window in response to a failed token search for the previous window, which occurs where either said primary token for the previous window of said updated file is not found in the token table for said original file or where at least one matching primary token is found in the token table but no matching secondary token corresponding to said at least one matching primary token is found in the token table; and (h) repeating the cycle of steps (2b) through (2g) until the updated file is exhausted. - View Dependent Claims (9, 10, 11, 15)
-
-
16. A method for recording differences between first and second computer data files in a memory media associated with a programmable data processor, said files having a plurality of fixed length segments, comprising the steps of:
-
(1) generating a token table in said memory media by (a) reading a fixed length segment of said data file into said memory media; (b) generating a primary exclusive-or term for the segment; (c) generating a secondary order sensitive term for the segment; (d) concatenating said primary exclusive-or term and said secondary order sensitive term into a token; (e) recording said token in said memory media; and (f) repeating steps (a)-(e) for each of said plurality of fixed length segments until all of said segments in said data file have been read and the token table contains one token for each segment in said data file; and (2) recording differences between the first and second computer data files, using the token table, by (a) defining a window of consideration for said second data file starting at the first character of said second data file, said window having the same number of characters as each of said plurality of fixed length segments of said first data file; (b) generating a window exclusive-or term for the window of consideration in the same manner that the primary exclusive-or term for each segment is generated; (c) searching the token table for a primary exclusive-or term matching the window exclusive-or term; (d) if a matching primary exclusive-or term is found in the token table, (i) generating a window order sensitive term for the characters in the window of consideration in the same manner that the secondary order sensitive term for each segment is generated; (ii) comparing said window order sensitive term with the secondary order sensitive term which forms part of the token corresponding to the matching primary exclusive-or term; and (iii) when the window exclusive-or term and the order sensitive term match the primary exclusive-or term and secondary order sensitive term of a respective token in the token table, recording information identifying the respective token and recording the offset of the window of consideration and the number of characters in the window of consideration into a difference signature in said memory media, and advancing the window of consideration by the length of a segment to beyond the last character in the current window of consideration and returning to step (2b); (e) if no match for said window exclusive-or term is found or if no match for the secondary order sensitive term for the window of consideration is found after completion of step (2d), then, (i) adjusting said window exclusive-or term to remove the exclusive-or representation of the first character of the window of consideration in said window exclusive-or term and to add the exclusive-or representation of the next character beyond the last character in the current window of consideration to the window exclusive-or term and advancing the window of consideration forward by one character; and (f) repeating steps (2c) through (2e) until the window of consideration reaches the end of the second data file. - View Dependent Claims (17, 18, 19, 20, 21, 22, 26, 27, 28)
-
-
23. A method for quickly recording differences between first and second data files having a plurality of fixed length segments in a memory media associated with a programmable data processor, when the majority of insertions in and deletions of characters in segments of the first file are known to change the offsets of only those segments where said insertions and deletion have been made but do not change the offsets of adjacent segments, comprising the steps of:
-
(1) generating a token table in said memory media by (a) reading a fixed length segment of said data file into said memory media; (b) generating a primary exclusive-or term for the segment; (c) generating a secondary order sensitive term; (d) concatenating said primary exclusive-or term and said secondary order sensitive term into a token; (e) recording said token in said memory media; and (f) repeating steps (a)-(e) for each of said plurality of fixed length segments until all of said segments in said data file have been read and the token table contains one token for each segment in said data file; and (2) recording differences between the first and second computer data files, using the token table, by (a) defining a window of consideration for said second data file starting at the first character of said second file, said windows having the same number of characters as each of said plurality of fixed length segments of said first data file; (b) generating a window exclusive-or term for the window of consideration in the same manner that the primary exclusive-or term for each segment is generated; (c) selecting a token from the token table at an index in the token table which is determined by dividing the offset of the window of consideration by the number of characters in a segment to obtain an index value and comparing said window exclusive-or term with the primary exclusive-or term of the selected token in the token table corresponding to the index for the segment containing said index value; (d) if a matching primary exclusive-or term is found in the token table, (i) generating a window order sensitive term for the characters in the window of consideration in the same manner that the secondary order sensitive term for each segment is generated; (ii) comparing said window order sensitive term with the secondary order sensitive term which forms part of the token corresponding to the matching primary exclusive-or term; and (iii) when the window exclusive-or term and the window order sensitive term match the primary exclusive-or term and secondary order sensitive term of a respective token in the token table, recording the index of the respective token and the offset of the first character of the window of consideration in a difference signature in said memory media; (e) advancing the window of consideration by the length of a segment to beyond the last character in the current window of consideration and returning to step (2b); and (f) repeating steps (2c) through (2e) until the window of consideration reaches the end of the second data file. - View Dependent Claims (24, 25)
-
Specification