Method and apparatus for finding differences between two computer files efficiently in linear time and for using these differences to update computer files
First Claim
1. A method for converting an OLD computer file into an updated NEW computer file, said method comprising the steps of:
- a. identifying differences between said OLD file and a NEW computer file by passing both files through a patch build program, wherein said NEW file is divided into b locks of fixed size and each block is successively compared to a contiguous group of blocks of fixed size in said OLD file;
b. creating a patch file based on identified differences, wherein said patch file incorporates matched strings in said OLD file and matched strings in the prior portion of said NEW file; and
c. combining said OLD file and said patch file by passing both files through a patch apply program to obtain said NEW file.
1 Assignment
0 Petitions
Accused Products
Abstract
A method of updating a computer file from an old file into a new file comprises blocking the new file and the old file into fixed-size blocks, maintaining a window (collection of contiguous blocks) for each file on which lookup preprocessing has been performed, and performing match processing on each new file block in turn (comparing against both the old and new windows) using a key-sampling technique combined with approximate matching. For each new file block, the match information is then optimized for coding efficiency and encoded into a patch file that describes an algorithm for converting the old file into the new file. The patch file application method and apparatus then performs the algorithm described in the patch file. The method uses a fixed amount of random-access memory regardless of the sizes of the two files and uses no temporary mass storage. In addition, the method has a running time roughly proportional to the size of the new file and allows the use of parallel processing to reduce the time required. The system and method produce patch files which are smaller than prior systems and methods, and allow the operator of the apparatus to perform an efficiency/effectiveness trade-off.
45 Citations
38 Claims
-
1. A method for converting an OLD computer file into an updated NEW computer file, said method comprising the steps of:
-
a. identifying differences between said OLD file and a NEW computer file by passing both files through a patch build program, wherein said NEW file is divided into b locks of fixed size and each block is successively compared to a contiguous group of blocks of fixed size in said OLD file;
b. creating a patch file based on identified differences, wherein said patch file incorporates matched strings in said OLD file and matched strings in the prior portion of said NEW file; and
c. combining said OLD file and said patch file by passing both files through a patch apply program to obtain said NEW file. - View Dependent Claims (2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20)
-
-
21. An apparatus for controlling a general purpose computer so as to compare an OLD file and a NEW file, to construct a patch file containing the differences between said files, and subsequently enabling a general purpose computer provided with said patch file to construct said NEW file from said OLD file and said patch file, comprising:
-
a. file input storage means for storing two currently selected blocks of said NEW file and a plurality of currently selected blocks of said OLD file;
b. string table storage means for storing a plurality of OLD string tables and a NEW string table, each comprising a plurality of fixed-length substrings and associated indices within a currently selected block of OLD or NEW file;
c. a string table preparation circuit selecting and indexing successive portions of a currently selected block of OLD file so that contents of said string table storage means are constructed based on said selected block;
d. match table storage means for storing data structures describing approximate matches between regions of currently selected block of NEW file and;
i. earlier regions of said currently selected block of NEW file or ii. regions of previously selected block of NEW file or iii. regions of one of a plurality of currently selected blocks of OLD file;
e. a match table preparation circuit forming contents of said match table storage means by;
i. using contents of string table storage means to locate copies of fixed-length substrings of said selected block of NEW file in;
(a) earlier portions of said currently selected block of NEW file or (b) previously selected block of NEW file or (c) One of said currently selected blocks of OLD file, ii. extending said matches to longer approximate matches in the forward and reverse directions, iii. recording said approximate matches and mismatch information, and iv. updating said NEW string table as portions of said NEW file are processed;
f. a match table optimization circuit controlling the lengths of the approximate matches in the data structures in the match table storage means to produce a minimized patch file, said optimization based on estimates of relative coding efficiency of matches and mismatches;
g. a patch file encoding circuit producing successive blocks of said patch file from optimized match table; and
h. a patch file apply circuit combining said patch file and said OLD file, producing said NEW file. - View Dependent Claims (22, 23, 24, 25, 26, 27, 28, 29, 30, 31, 32, 33, 34, 35, 36)
-
-
37. A system for converting an OLD computer file into an updated NEW computer file, the system comprising:
-
a. means for identifying differences between said OLD file and a NEW computer file by passing both files through a patch build program, wherein said NEW file is divided into blocks of fixed size and each block is successively compared to a contiguous group of blocks of fixed size in said OLD file;
b. creating a patch file based on identified differences, wherein said patch file incorporates not only matched strings in said OLD file but also matched strings in the prior portion of said NEW file; and
c. combining said OLD file and said patch file by passing both files through a patch apply program to obtain said NEW file.
-
-
38. A method of updating a computer file from an old file into a new file comprising the steps of:
-
blocking the new file and the old file into fixed-size blocks;
maintaining a collection of contiguous blocks for each file on which lookup preprocessing has been performed; and
performing match processing on each new file block in turn by comparing against both the old and new contiguous blocks using a key-sampling technique combined with approximate matching.
-
Specification