SYSTEM AND METHOD FOR DIFFERENTIAL COMPRESSION OF DATA FROM A PLURALITY OF BINARY SOURCES
First Claim
1. A method for handling first and second versions of a data object for storage and transmission, the method comprising the steps of:
- performing a consecutive linear pass over both versions;
identifying a first matching byte-level string, occurring in the second version, which matches a string in the first version; and
producing a set containing (i) identified matching segments and, (ii) for each of the identified matching segments, an offset reflecting the relative positions of the matching segments within the first and second versions;
whereby, for versions whose matching strings are both local and sequential, the set closely approximates a set produced by a method of scanning the entire second version to find a best possible match with a string in the first version.
2 Assignments
0 Petitions
Accused Products
Abstract
A method and a system are presented for generating differentially compressed output from binary sources. Given two versions of the same file as input streams, a compact encoding of one of the input streams is generated, by representing it as a set of changes with respect to the other input stream. Algorithms for differencing files requiring time linear in the size of the input and a constant amount of space for execution are presented. In addition, advanced techniques for improving existing differencing algorithms are developed and applied to previous methods. These techniques allow algorithms to increase their efficiency without a loss of compression and to accept arbitrarily large inputs without sacrificing correctness or degrading the compression data rate. The differential compression methods provide a computationally efficient compression technique for applications that generate versioned data.
-
Citations
1 Claim
-
1. A method for handling first and second versions of a data object for storage and transmission, the method comprising the steps of:
-
performing a consecutive linear pass over both versions;
identifying a first matching byte-level string, occurring in the second version, which matches a string in the first version; and
producing a set containing (i) identified matching segments and, (ii) for each of the identified matching segments, an offset reflecting the relative positions of the matching segments within the first and second versions;
whereby, for versions whose matching strings are both local and sequential, the set closely approximates a set produced by a method of scanning the entire second version to find a best possible match with a string in the first version.
-
Specification