Method and apparatus for finding differences between two computer files efficiently in linear time and for using these differences to update computer files
First Claim
1. A method for converting an OLD computer file into an updated NEW computer file, said method comprising the steps of:
- a. identifying differences between said OLD file and a NEW computer file by passing both files through a patch build program, the patch build program comprising the steps of;
i. identifying matches between the OLD file and the NEW computer file by(a). dividing the NEW computer file into blocks of fixed size,(b). successively comparing each block to a contiguous group of blocks of fixed size in the OLD file, and(c). scanning each pair of adjacent blocks of the NEW computer file for matched strings whose starting positions in the NEW computer file differ by less than a fixed window size;
.ii. creating a patch file based on identified matches in step a., wherein the patch file comprises a plurality of blocks corresponding to the fixed size blocks of the NEW computer file and incorporates matched strings in the OLD file and the matched strings in the NEW computer file; and
b. combining said OLD file and said patch file by passing both files through a patch apply program to obtain said NEW file.
1 Assignment
0 Petitions
Accused Products
Abstract
A method of updating a computer file from an old file into a new file comprises blocking the new file and the old file into fixed-size blocks, maintaining a window (collection of contiguous blocks) for each file on which lookup preprocessing has been performed, and performing match processing on each new file block in turn (comparing against both the old and new windows) using a key-sampling technique combined with approximate matching. For each new file block, the match information is then optimized for coding efficiency and encoded into a patch file that describes an algorithm for converting the old file into the new file. The patch file application method and apparatus then performs the algorithm described in the patch file. The method uses a fixed amount of random-access memory regardless of the sizes of the two files and uses no temporary mass storage. In addition, the method has a running time roughly proportional to the size of the new file and allows the use of parallel processing to reduce the time required. The system and method produce patch files which are smaller than prior systems and methods, and allow the operator of the apparatus to perform an efficiency/effectiveness trade-off.
47 Citations
37 Claims
-
1. A method for converting an OLD computer file into an updated NEW computer file, said method comprising the steps of:
-
a. identifying differences between said OLD file and a NEW computer file by passing both files through a patch build program, the patch build program comprising the steps of; i. identifying matches between the OLD file and the NEW computer file by (a). dividing the NEW computer file into blocks of fixed size, (b). successively comparing each block to a contiguous group of blocks of fixed size in the OLD file, and (c). scanning each pair of adjacent blocks of the NEW computer file for matched strings whose starting positions in the NEW computer file differ by less than a fixed window size; .ii. creating a patch file based on identified matches in step a., wherein the patch file comprises a plurality of blocks corresponding to the fixed size blocks of the NEW computer file and incorporates matched strings in the OLD file and the matched strings in the NEW computer file; and b. combining said OLD file and said patch file by passing both files through a patch apply program to obtain said NEW file. - View Dependent Claims (2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20)
-
-
21. An apparatus for controlling a general purpose computer so as to compare an OLD file and a NEW file, to construct a patch file containing the differences between said files, and subsequently enabling a general purpose computer provided with said patch file to construct said NEW file from said OLD file and said patch file, comprising:
-
a. file input storage means for storing two currently selected blocks of said NEW file and a plurality of currently selected blocks of said OLD file; b. string table storage means for storing a plurality of OLD string tables and a NEW string table, each comprising a plurality of fixed-length substrings and associated indices within a currently selected block of the OLD or the NEW file selected from the file input storage means; c. a string table preparation circuit for selecting and indexing successive portions of each of the plurality of currently selected blocks of the OLD file so that contents of said string table storage means are constructed based on said selected block; d. match table storage means for storing data structures describing exact matches and approximate matches between regions of the currently selected block of the NEW file and; i. earlier regions of said currently selected block of the NEW file or ii. regions of the previously selected block of the NEW file or iii. regions of one of a plurality of currently selected blocks of the OLD file; e. a match table preparation circuit forming contents of said match table storage means by; i. using contents of the string table storage means to locate copies of fixed-length substrings of said selected block of the NEW file in; (a) earlier portions of said currently selected block of the NEW file or (b) previously selected block of the NEW file or (c) one of said currently selected blocks of the OLD file, ii. extending said matches to longer approximate matches in the forward and reverse directions, iii. recording said matches, said approximate matches and mismatch information, and iv. updating said NEW string table as portions of said NEW file are processed; f. a match table optimization circuit controlling the lengths of the approximate matches in the data structures in the match table storage means to produce a minimized patch file, said optimization based on estimates of relative coding efficiency of matches and mismatches; g. a patch file encoding circuit producing successive blocks of said patch file from optimized match table; and h. a patch file apply circuit combining said patch file and said OLD file, producing said NEW file. - View Dependent Claims (22, 23, 24, 25, 26, 27, 28, 29, 30, 31, 32, 33, 34, 35, 36)
-
-
37. A system for converting an OLD computer file into an updated NEW computer file, the system comprising:
-
a. means for identifying differences between said OLD file and a NEW computer file by passing both files through a patch build program, wherein said patch build program comprises; i. means for identifying matches in which the NEW file is divided into blocks of fixed size and (a.) each block is successively compared to a contiguous group of blocks of fixed size in the OLD file; and (b.) each pair of adjacent blocks of the NEW file is scanned for matched strings whose starting positions in the NEW file differ by less than a fixed window size; ii. means for creating a patch file based on matches identified by the match identification means, wherein said patch file comprises a plurality of blocks corresponding to the fixed size blocks of the NEW file and incorporates matched string in the OLD file and the matched strings in said NEW file; and b. means for combining said OLD file and said patch file by passing both files through a patch apply program to obtain said NEW file.
-
Specification