Apparatus and method for reconstructing a file from a difference signature and an original file
First Claim
1. A combination comprising a storage device, a file stored in said storage device, and a token table stored in said storage device, said token table comprising first and second different hashing mathematical representations for each of a plurality of fixed equal length character segments of said file.
7 Assignments
0 Petitions
Accused Products
Abstract
Invention maintains duplicate files in safe places. A SCAN computer program creates a TOKEN Table of an earlier file. The TOKEN Table reflects the indices of successive segments of the file and the exclusive-or (XR) and Cyclic redundancy check (CRC) products of the characters in each segment. An updated file is compared to the earlier file by comparing the XR and CRC products of segments in the updated file to the XR and CRC products in the TOKEN Table. On detecting matching products for identical segments, the next segments are compared. On mismatch, the segment (window) for the updated file is bumped one character and new XR and CRC products generated and compared. The indices of the TOKEN Table and the offsets from the start of the file of the first characters of the updated file matching segments are set forth in a Match Table. Next the updated file is scrolled through for the non-matching information determined by acting on the indices and offsets of the Match Table to form the TRANSITION Table which is the Match Table and the updated file non-matching information. The TRANSITION Table contains the delta information which may be sent to another location having a copy of the earlier file thereat: the whole updated file need not be sent there. A reconstruction program at the location looks at the TRANSITION Table to determine where to get the characters for the copy of the updated file it is creating.
157 Citations
80 Claims
- 1. A combination comprising a storage device, a file stored in said storage device, and a token table stored in said storage device, said token table comprising first and second different hashing mathematical representations for each of a plurality of fixed equal length character segments of said file.
- 8. A computer apparatus comprising a storage device, a file stored in said storage device, means for generating a token table having a signature of said file and storing said token table in said storage device, said means for generating and storing said token table comprising means for generating first and second different hashing mathematical representations for each of a plurality of fixed equal length character segments of said file.
- 14. A method for generating a file signature, in a computer having a memory with a file therein and data processing means for producing a signature of said file, said method comprising generating and maintaining a record of first and second different hashing mathematical representations for each of a plurality of fixed equal length character segments of said file.
- 23. A combination comprising a memory, a signature and a first data file stored in said memory, said signature indicating a difference between said first data file and a second data file with respect to one another, said first and second data files each having successive segments of an equal number of characters and having a reference point at the same character position in both of said first and second data files, said signature comprising a plurality of offsets for said second data file, each offset representing the distance from said reference point in said second data file of a respective segment of said successive segments in said second data file, which is identical to one of said segments in said first data file, the distance from said reference point being the distance of a fixed character position within each respective segment from said reference point.
- 26. A computer apparatus having a memory with first and second files stored therein, said files each having successive segments of characters, comprising data processing means for generating first and second different hashing mathematical representations for each of a plurality of fixed equal length character segments of said files, means for comparing said hashing mathematical representations of said segments of said files to identify identical character sequences of said files, means for generating a file signature of portions of said files that are different, and means for generating offsets indicating a displacement, from a reference point, of character segments in said second file which are identical to character segments in said first file.
- 29. A method of creating, from a first file having fixed length segments of characters and a difference file signature both stored in a computer having a memory and data processing means, a copy of a second file having fixed length segments of characters, said difference file signature indicating similarities and differences between said first file and said second file, an original of said second file having been created before the creation of said difference file signature, said method comprising generating a mathematical representation for each of said segments of the first file to uniquely identify each respective segment, and generating a mathematical representation for each of said segments of the second file to identify each respective segment in the second file, and creating in said difference file signature a record of representations of segments in the original of said second file that are identical to representations of segments in the first file and of the location or offsets of identical segments in the second file, and also storing in the record information which is different in the second file, the offset of said information which is different in said second file being derived from the offsets of identical segments in both said first and second files.
- 35. A method for creating a second window segment token from a base window segment token, said method comprising creating a base window segment in a computer memory by reading a segment of a file in a computer memory, said segment in said file being of the same size as said base window segment, calculating an exclusive-or signature that is an exclusive-or representation of the characters of said base window segment of the file to create a base window segment token, creating said second window segment which comprises all characters of said base window segment except the first character of said base window segment and comprises the next character in said file after said segment in said file which was read to form the base window segment by reading said next character in said file, and creating a second window segment token for said second window segment by adjusting the base window segment token to reflect the deletion of the first character of said base window segment and the addition of said next character in said file in forming the second window segment.
- 36. A storage device having a first file and a signature stored in said storage device, said signature representing differences of said first file with respect to a second file, said signature comprising offsets indicating a displacement of identical fixed length segments in said second file from the location of said segments in the first file.
- 38. A method of periodically maintaining, in a computer having a read/write storage device and data processing means, an archival record of a first version of a file and periodic changes to the first version of the file, the file being at least periodically updated in the read/write storage device, comprising periodically generating, using the data processing means, exclusive-or mathematical representations and order sensitive hash products of fixed length segments of the file, and creating a record in the read/write storage device of those representations and hash products that are identical to representations and hash products of segments in the file at the end of previous periods of said periodical updating, said record further comprising segments in the original version of the file that are different from said segments at the end of said previous periods of said periodical updating.
- 43. A method for producing a copy of a second file that is an updated version of an original file in a storage device, comprising producing a token set from each of the original file and an updated version of said original file, comparing the token sets of the original file and said updated version of said original file to identify the offsets of character segments in the updated version of said original file which are identical to character segments in said original file, thereby also identifying information in said updated version of said original file which is unmatched in said original file, each token set comprising first and second different mathematical representations of each segment of said original file and of each segment of said updated version of said original file.
- 47. A method for producing a first file representative of differences between second and third files, comprising generating first and second hash tables from successive equal length segments of each of said second and third files by generating successive first and second different mathematical representations of equal length segments of said second and third files, respectively, wherein a segment is a set of successive characters, comparing said first and second hash tables, with respect to said second and third files, to identify segments of said second and third files that match one another, and producing said first file by listing segments of said second and third files that match one another.
-
52. A method for producing a token table from a computer data file in a memory media, the method comprising, for each of a plurality of fixed length segments in the data file, the steps of:
-
(a) generating a first hashing representation for the segment;
(b) generating a second hashing representation for the segment; and
(c) concatenating said first and second hashing representations into a segment representation. - View Dependent Claims (53, 54, 55, 56, 57, 58)
(d) reading a respective fixed length segment of said data file into said memory media before generating said first hashing representation; - and
(e) recording said segment representation in said memory media; and
repeating steps (a)-(e) for each of said plurality of fixed length segments until all of said segments in said data file have been read and the token table contains one segment representation for each segment in said data file.
-
-
54. The method of claim 52 wherein said plurality of segments are nonoverlapping.
-
55. The method of claim 54 wherein said plurality of segments are consecutive.
-
56. The method of claim 52 wherein said first hashing representation is an exclusive-or representation and said second hashing representation is a non-exclusive-or representation.
-
57. The method of claim 56 comprising generating said exclusive-or representation by
(a) calculating the exclusive-or product of each character in the segment; -
(b) dividing the segment into equal length subsets;
(c) generating an exclusive-or product for at least one of said subsets of the segment; and
(d) concatenating the exclusive-or product for at least one of said subsets with the exclusive-or product of each character in the segment to form said exclusive-or representation.
-
-
58. The method of claim 56 wherein said non-exclusive-or hashing representation is a cyclic redundancy check product.
-
59. A method for producing a token table from a computer data file in a memory media, the method comprising, for each of a plurality of fixed length segments in the data file, the steps of:
-
(a) generating a first hashing representation for the segment using a first algorithm; and
(b) generating a second hashing representation for the segment using a different algorithm from said first algorithm, said first and second hashing representations together comprising a segment representation for each segment.
-
- 60. A token table in a memory medium comprising first and second different hashing representations of each of at least two nonoverlapping, fixed equal length segments in a file in said computer.
-
64. A method for storing different versions of a computer file having a plurality of fixed length segments in a first memory media associated with a data processor, comprising the steps of:
-
(1) generating a token table in said first memory media, said token table comprising two hashing representations for each of said plurality of fixed length segments of a first version of said computer file, (2) generating a record of differences between the first version of said computer file and a second version of said computer file using said token table, and (3) storing said record in a second memory media. - View Dependent Claims (65)
-
-
75. A method for producing a token table from a computer data file in a memory media, the method comprising, for each of a plurality of fixed length segments in the data file, the steps of:
-
(a) generating a first hashing representation for the segment;
(b) generating a second hashing representation for the segment; and
(c) associating the first representation with the second representation in the token table. - View Dependent Claims (76)
-
Specification