Apparatus and method for reconstructing a file from a difference signature and an original file

US 5,479,654 A
Filed: 03/30/1993
Issued: 12/26/1995
Est. Priority Date: 04/26/1990
Status: Expired due to Term

First Claim

Patent Images

1. A method for producing a difference signature of differences between an original file and an updated version of the original file, comprising(1) creating a token table from an original file in a first storage device by producing a token set for each equal sized contiguous segment of said original file, each token set comprising a primary exclusive-or based token and at least one order sensitive secondary token or cyclic redundancy check product term;

and(2) generating a difference signature, using the token table and an updated file, by;

(a) defining a window of consideration for the updated file, said window being of a size equivalent to the segment size used to create the token set for the original file and comprising successive characters in the updated file;

(b) calculating a primary exclusive-or based token for the window of consideration;

(c) searching the token table for a primary token which matches the primary token for the window and advancing to step 2(g) if said matching primary token is not found in the token table;

(d)(i) generating a secondary token for said window in response to finding in the token table a primary token which matches the primary token from the window and comparing the secondary token to the secondary token in the corresponding token set in the token table; and

(ii) advancing to step 2(e) if the secondary tokens match;

(e) logging the offset of the current window to the difference signature to correlate the relative locations of the matching segment in the original and updated files, in response to finding a match between the secondary token from the window and the corresponding secondary token from the token set;

(f) advancing the window of consideration by the segment size to the next segment after the matched text, if there are any remaining segments in the updated file, and resuming the method at step (2b) above;

(g) advancing the window of consideration by at least one character to create the next window, which includes the characters of the previous window and at least one character in the updated file following the previous window minus the equivalent number of characters at the beginning of the previous window, in response to a failed token search for the previous window, which occurs where either said primary token for the previous window of said updated file is not found in the token table for said original file or where at least one matching primary token is found in the token table but no matching secondary token corresponding to said at least one matching primary token is found in the token table;

(h) generating a primary token for said next window of consideration by adjusting the primary token from the previous window and(j) repeating the cycle of steps (2b) through (2i) until the updated file is exhausted.

View all claims

6 Assignments

Timeline View

Assignment View

0 Petitions

Accused Products

Abstract

Invention maintains duplicate files in safe places. A SCAN computer program creates a TOKEN Table of an earlier file. The TOKEN Table reflects the indices of successive segments of the file and the exclusive-or (XR) and Cyclic redundancy check (CRC) products of the characters in each segment. An updated file is compared to the earlier file by comparing the XR and CRC products of segments in the updated file to the XR and CRC products in the TOKEN Table. On detecting matching products for identical segments, the next segments are compared. On mismatch, the segment (window) for the updated file is bumped one character and new XR and CRC products generated and compared. The indices of the TOKEN Table and the offsets from the start of the file of the first characters of the updated file matching segments are set forth in a Match Table. Next the updated file is scrolled through for the non-matching information determined by acting on the indices and offsets of the Match Table to form the TRANSITION Table which is the Match Table and the updated file non-matching information. The TRANSITION Table contains the delta information which may be sent to another location having a copy of the earlier file thereat: the whole updated file need not be sent there. A reconstruction program at the location looks at the TRANSITION Table to determine where to get the characters for the copy of the updated file it is creating.

Citations

28 Claims

1. A method for producing a difference signature of differences between an original file and an updated version of the original file, comprising(1) creating a token table from an original file in a first storage device by producing a token set for each equal sized contiguous segment of said original file, each token set comprising a primary exclusive-or based token and at least one order sensitive secondary token or cyclic redundancy check product term;
- and(2) generating a difference signature, using the token table and an updated file, by;
  
  (a) defining a window of consideration for the updated file, said window being of a size equivalent to the segment size used to create the token set for the original file and comprising successive characters in the updated file;
  
  (b) calculating a primary exclusive-or based token for the window of consideration;
  
  (c) searching the token table for a primary token which matches the primary token for the window and advancing to step 2(g) if said matching primary token is not found in the token table;
  
  (d)(i) generating a secondary token for said window in response to finding in the token table a primary token which matches the primary token from the window and comparing the secondary token to the secondary token in the corresponding token set in the token table; and
  
  (ii) advancing to step 2(e) if the secondary tokens match;
  
  (e) logging the offset of the current window to the difference signature to correlate the relative locations of the matching segment in the original and updated files, in response to finding a match between the secondary token from the window and the corresponding secondary token from the token set;
  
  (f) advancing the window of consideration by the segment size to the next segment after the matched text, if there are any remaining segments in the updated file, and resuming the method at step (2b) above;
  
  (g) advancing the window of consideration by at least one character to create the next window, which includes the characters of the previous window and at least one character in the updated file following the previous window minus the equivalent number of characters at the beginning of the previous window, in response to a failed token search for the previous window, which occurs where either said primary token for the previous window of said updated file is not found in the token table for said original file or where at least one matching primary token is found in the token table but no matching secondary token corresponding to said at least one matching primary token is found in the token table;
  
  (h) generating a primary token for said next window of consideration by adjusting the primary token from the previous window and(j) repeating the cycle of steps (2b) through (2i) until the updated file is exhausted.
- View Dependent Claims (3, 4, 5, 6, 7, 8, 12, 13, 14)
- - 3. The method of claim 1 or 2 comprising constraining the search for the primary token for the window to the token set representing a corresponding offset from the updated file in the token table.
  - 4. The method of claim 1 or 2 comprising limiting the search for the primary token in the token table to fewer than all token sets in the token table.
  - 5. The method of claim 4 further comprising creating lower and upper bounds of said token table and limiting the search for the primary token in the token table to token sets within said lower and upper bounds, said bounds being created by dividing the offset of a character of the window of consideration by the segment size to produce an index value equal to an index value in the token table, subtracting a first preselected value from the index value to determine the lower bound and adding a second preselected value to the index value to determine the upper bound.
  - 6. The method of claim 1 further comprising generating the primary exclusive-or token for each said token set and for the window of consideration by dividing each segment into sets, generating an exclusive-or product of each set, and concatenating the exclusive-or products of at least one of said sets and at least another of said sets, and wherein the primary token from the previous window is adjusted in step 2(h) by dividing the primary token into components corresponding to the sets, adjusting each component by exclusive-oring those characters which are leaving the previous window or entering said next window of consideration with the corresponding components which comprise those characters and subsequently recomposing the primary token for said next window of consideration.
  - 7. The method of claim 1 further comprising including characters from the previous window not included in the next window in the difference signature as part of a non-matching segment before generating said primary token for said next window of consideration.
  - 8. The method of claim 1 further comprising including characters from the previous window not included in the next window in the difference signature as part of a non-matching segment after completing step 2(j) of claim 1.
  - 12. The method of claim 1 or 2 wherein step 2(d) further comprises (iii) searching the token table for another matching primary token if the secondary token for said window and the secondary token for the corresponding token set in the token table do not match and the end of the token table has not been reached, and advancing to step 2(g) if said another matching primary token is not found in the token table;
    - (iv) comparing said secondary token for said window to the secondary token in the corresponding token set in the token table in response to finding in the token table said another matching primary token which matches the primary token from said window; and
      
      (v) returning to step 2(d)(ii)
  - 13. A method for producing a duplicate copy of an updated filed from said difference signature produced by the method of claim 1 or 2 and from either an original file or a duplicate of said original file, further comprising using the difference signature and the original file or duplicate thereof to assemble a duplicate of the updated file by:
    - (a) using the original file as the source for matching segments;
      
      (b) using the difference signature as the source for non-matching segments; and
      
      (c) assembling the matching and non-matching segments.
  - 14. The method of claim 1 or 2 wherein said original file from which said token table is created is stored in said first storage device, said second storage device has a duplicate of said original file stored in a second storage device, and said duplicate copy of said updated file is produced in said second storage device, further comprising deleting or otherwise modifying said original file in said first storage device at any time after said token table is created by step 1 of claims 1 or 2 without affecting the ability to produce said duplicate copy of said updated file in said second storage device from said duplicate of said original file and said difference signature.

2. A method for producing a difference signature of differences between an original file and an updated version of the original file when, in updating the original file, the majority of insertions and deletions of characters in segments of the original file are known to change the offsets of only those segments where said insertions and deletions have been made but do not change the offsets of adjacent segments, comprising;
- (1) creating a token table from an original file in a first storage device byproducing a token set for each equal sized contiguous segment of said original file, each token set comprising a primary exclusive-or based token and at least one order sensitive secondary token or cyclic redundancy check product term; and
  
  (2) generating a difference signature, using the token table and an updated file, by;
  
  (a) defining a window of consideration for the updated file, said window being of a size equivalent to the segment size used to create the token set for the original file and comprising successive characters in the updated file;
  
  (b) calculating a primary token for the window of consideration;
  
  (c) searching the token table for a primary token which matches the primary token for the window and advancing to step 2(g) if said matching primary token is not found in the token table;
  
  (d)(i) generating a secondary token for said window in response to finding in the token table a primary token which matches the primary token for the window and comparing the secondary token to the secondary token in the corresponding token set in the token table; and
  
  (ii) advancing to step 2(e) if the secondary tokens match;
  
  (e) logging the offset of the current window to the difference signature to correlate the relative locations of the matching segment in the original and updated files, in response to finding a match between the secondary token from the window and the corresponding secondary token from the token set;
  
  (f) advancing the window of consideration by the segment size to the next segment after the matched text, if there are any remaining segments in the updated file, and resuming the method at step (2b) above;
  
  (g) advancing the window of consideration by the segment size to the next segment to create the next window in response to a failed token search for the previous window, which occurs where either said primary token for the previous window of said updated file is not found in the token table for said original file or where at least one matching primary token is found in the token table but no matching secondary token corresponding to said at least one matching primary token is found in the token table; and
  
  (h) repeating the cycle of steps (2b) through (2g) until the updated file is exhausted.
- View Dependent Claims (9, 10, 11, 15)
- - 9. The method of claim 2 comprising generating the primary exclusive-or token for each said token set and for the window of consideration by dividing each segment into sets, generating an exclusive-or product of each set, and concatenating the exclusive-or products of at least one of said sets and at least another of said sets.
  - 10. The method of claim 2 further comprising causing the contents of the entire window to be included in the difference signature before advancing the window of consideration by the segment size in step 2(g).
  - 11. The method of claim 2 further comprising causing the contents of the entire window to be included in the difference signature after completing step 2(h).
  - 15. The invention of claim 9 wherein the second set is the entire segment to which said first set belongs.

16. A method for recording differences between first and second computer data files in a memory media associated with a programmable data processor, said files having a plurality of fixed length segments, comprising the steps of:
- (1) generating a token table in said memory media by(a) reading a fixed length segment of said data file into said memory media;
  
  (b) generating a primary exclusive-or term for the segment;
  
  (c) generating a secondary order sensitive term for the segment;
  
  (d) concatenating said primary exclusive-or term and said secondary order sensitive term into a token;
  
  (e) recording said token in said memory media; and
  
  (f) repeating steps (a)-(e) for each of said plurality of fixed length segments until all of said segments in said data file have been read and the token table contains one token for each segment in said data file; and
  
  (2) recording differences between the first and second computer data files, using the token table, by(a) defining a window of consideration for said second data file starting at the first character of said second data file, said window having the same number of characters as each of said plurality of fixed length segments of said first data file;
  
  (b) generating a window exclusive-or term for the window of consideration in the same manner that the primary exclusive-or term for each segment is generated;
  
  (c) searching the token table for a primary exclusive-or term matching the window exclusive-or term;
  
  (d) if a matching primary exclusive-or term is found in the token table,(i) generating a window order sensitive term for the characters in the window of consideration in the same manner that the secondary order sensitive term for each segment is generated;
  
  (ii) comparing said window order sensitive term with the secondary order sensitive term which forms part of the token corresponding to the matching primary exclusive-or term; and
  
  (iii) when the window exclusive-or term and the order sensitive term match the primary exclusive-or term and secondary order sensitive term of a respective token in the token table, recording information identifying the respective token and recording the offset of the window of consideration and the number of characters in the window of consideration into a difference signature in said memory media, and advancing the window of consideration by the length of a segment to beyond the last character in the current window of consideration and returning to step (2b);
  
  (e) if no match for said window exclusive-or term is found or if no match for the secondary order sensitive term for the window of consideration is found after completion of step (2d), then,(i) adjusting said window exclusive-or term to remove the exclusive-or representation of the first character of the window of consideration in said window exclusive-or term and to add the exclusive-or representation of the next character beyond the last character in the current window of consideration to the window exclusive-or term and advancing the window of consideration forward by one character; and
  
  (f) repeating steps (2c) through (2e) until the window of consideration reaches the end of the second data file.
- View Dependent Claims (17, 18, 19, 20, 21, 22, 26, 27, 28)
- - 17. A method according to claim 16 wherein said secondary order sensitive term comprises a cyclic redundancy product term.
  - 18. The method of claim 16 further comprising, after step (2d)(iii) and before step (2e) of claim 16, when the window order sensitive term for the window of consideration does not match the corresponding secondary order sensitive term for the respective token in the token table, resuming the search of the token table at step (2c) of claim 16 until the token table is exhausted.
  - 19. A method according to claim 16 comprising(1) generating said primary exclusive-or term by(a) calculating the exclusive-or product of each character in the segment;
    - (b) dividing each of said plurality of fixed length segments into equal length subsets;
      
      (c) generating an exclusive-or product for at least one of said subsets of a respective segment; and
      
      (d) including the exclusive-or product for at least one of said subsets in said token for said respective segment by concatenating the exclusive-or product for at least one of said subsets with the primary exclusive-or term to form said primary exclusive-or term;
      
      (2) generating said window exclusive-or term by(a) dividing the window into subwindows; and
      
      (b) generating an exclusive-or product for at least one of said subwindows of a respective window; and
      
      including the exclusive-or product for at least one of said subwindows in the window exclusive-or term by concatenating the exclusive-or product for at least one of said subwindows with the window exclusive-or term to form a concatenated window exclusive-or term; and
      
      (3) said step of adjusting said window exclusive-or term comprises exclusive-oring both the first character of the window of consideration and the next character beyond the last character in the current window of consideration with the exclusive-or product of the segment, exclusive-oring the first character of the window of consideration with all of the subsets included in the primary exclusive-or term which comprise the first character and exclusive-oring the next character beyond the last character in the current window of consideration with all of the subsets included in the primary exclusive-or term which comprise the next character.
  - 20. The method of claim 16 further comprising, in step (e) of claim 80, before adjusting said window exclusive-or term, recording the first character of the window of consideration into said difference signature in said memory media.
  - 21. The method of claim 16 wherein said step of recording information identifying the respective token in step (2d)(iii) of claim 16 comprises recording an index of the segment in the token table which corresponds to the respective token.
  - 22. The method of claim 16 wherein said step of recording information identifying the respective token in step (2d)(iii) of claim 16 comprises recording the offset of the segment in the first data file which corresponds to the respective token.
  - 26. A method according to claim 16 or 23 for using said difference signature and said first data file to construct said second data file comprising the steps of:
    - (a) reading a section of said difference signature;
      
      (b) if said section indicates a match between segments of said first and second data files, determining the corresponding offset in said first data file from the token table, and read the character segment at said offset into said second data file,(c) if said section indicates non-matching characters, reading said characters into said second data file from said difference signature; and
      
      (d) repeat steps (a)-(c) until all sections of the difference signature are read.
  - 27. A method according to claim 16 or 23 to efficiently store multiple versions of a file by storing a copy of said original data file and said difference signature in a memory media.
  - 28. A method according to claim 27 for backing up said second file wherein said programmable data processor for updating said original file is in a first memory device and said copy of said original file and said difference signature is stored in a second memory device.

23. A method for quickly recording differences between first and second data files having a plurality of fixed length segments in a memory media associated with a programmable data processor, when the majority of insertions in and deletions of characters in segments of the first file are known to change the offsets of only those segments where said insertions and deletion have been made but do not change the offsets of adjacent segments, comprising the steps of:
- (1) generating a token table in said memory media by(a) reading a fixed length segment of said data file into said memory media;
  
  (b) generating a primary exclusive-or term for the segment;
  
  (c) generating a secondary order sensitive term;
  
  (d) concatenating said primary exclusive-or term and said secondary order sensitive term into a token;
  
  (e) recording said token in said memory media; and
  
  (f) repeating steps (a)-(e) for each of said plurality of fixed length segments until all of said segments in said data file have been read and the token table contains one token for each segment in said data file; and
  
  (2) recording differences between the first and second computer data files, using the token table, by(a) defining a window of consideration for said second data file starting at the first character of said second file, said windows having the same number of characters as each of said plurality of fixed length segments of said first data file;
  
  (b) generating a window exclusive-or term for the window of consideration in the same manner that the primary exclusive-or term for each segment is generated;
  
  (c) selecting a token from the token table at an index in the token table which is determined by dividing the offset of the window of consideration by the number of characters in a segment to obtain an index value and comparing said window exclusive-or term with the primary exclusive-or term of the selected token in the token table corresponding to the index for the segment containing said index value;
  
  (d) if a matching primary exclusive-or term is found in the token table,(i) generating a window order sensitive term for the characters in the window of consideration in the same manner that the secondary order sensitive term for each segment is generated;
  
  (ii) comparing said window order sensitive term with the secondary order sensitive term which forms part of the token corresponding to the matching primary exclusive-or term; and
  
  (iii) when the window exclusive-or term and the window order sensitive term match the primary exclusive-or term and secondary order sensitive term of a respective token in the token table, recording the index of the respective token and the offset of the first character of the window of consideration in a difference signature in said memory media;
  
  (e) advancing the window of consideration by the length of a segment to beyond the last character in the current window of consideration and returning to step (2b); and
  
  (f) repeating steps (2c) through (2e) until the window of consideration reaches the end of the second data file.
- View Dependent Claims (24, 25)
- - 24. The method of claim 23 wherein the secondary order sensitive term comprises a cyclic redundancy check product term.
  - 25. The method according to claim 85 further comprising, after step (2d) and before step (2e) of claim 24, when the window exclusive-or term or the secondary cyclic redundancy check product term for the window of consideration does not match the corresponding primary exclusive-or term or secondary cyclic redundancy check product term for the respective token in the token table, recording all of the characters within the window of consideration into said difference signature in said memory media.

Specification

Resources

Litigation Campaign Assessment

Current Assignee
Double-Take Software Canada Incorporated (Open Text Corporation)
Original Assignee
Squibb Data Systems, Inc.
Inventors
Squibb, Mark
Primary Examiner(s)
Kriess, Kevin A.
Assistant Examiner(s)
TOPLU, LUCIEN

Application Number

US08/039,702
Time in Patent Office

1,001 Days
Field of Search

395/600, 395/144, 341/51, 364/955.3, 364/955.5, 364/956.1, 364/962.1, 364/966, 364/265.2, 364/260.81, 364/260.7
US Class Current

707/695
CPC Class Codes

G06F 16/10   File systems; File servers

H04L 67/06   specially adapted for file ...

H04L 69/329   in the application layer [O...

H04L 9/40   Network security protocols

Y10S 707/917   Text

Y10S 707/968   Partitioning

Y10S 707/99952   Coherency, e.g. same view t...

Apparatus and method for reconstructing a file from a difference signature and an original file

First Claim

6 Assignments

0 Petitions

Accused Products

Abstract

Citations

28 Claims

Specification

Solutions

Use Cases

Quick Links

Apparatus and method for reconstructing a file from a difference signature and an original file

First Claim

6 Assignments

Subscription Required

Subscription Required

0 Petitions

Subscription Required

Accused Products

Subscription Required

Abstract

Citations

28 Claims

Specification

Subscription Required

Solutions

Use Cases

Quick Links