Byte-level file differencing and updating algorithms
First Claim
1. A system for updating electronic files comprising byte-level file differencing and updating (FDU) algorithms, wherein the file differencing algorithm of the FDU algorithms is hosted on a first device, the file differencing algorithm:
- receiving two byte streams including byte streams corresponding to an original file and a new file, wherein the new file includes updated and revised versions of the original file;
determining a longest common sub-string (LCS) between the two byte streams and divides each of the two byte streams into sub-streams, wherein a first sub-stream is a prefix and a second sub-stream is a suffix to the respective LCS;
recursively determining an LCS and dividing each sub-stream until a size of the sub-streams is less than a pre-specified size and determines a byte-level difference between each of the corresponding sub-streams;
defining a protocol for structuring a delta file by using a set of operation codes and a variable length integer format to eliminate redundant information in the delta file;
generating the delta file including an operation array that codes the determined byte-level differences, wherein generating comprises merging operations of the operation array to reduce an amount of meta-data;
transferring the delta file to a second device via at least one coupling.
6 Assignments
0 Petitions
Accused Products
Abstract
A method for performing differencing and updating between electronic files is provided. A byte-level file differencing algorithm receives two byte streams corresponding to an original file and a new file. The new file includes updated and revised versions of the original file. The file differencing algorithm determines a longest common sub-string (LCS) between the two byte streams and divides each of the two byte streams into sub-streams. The sub-streams include the LCS along with prefix and suffix sub-streams to the LCS. The file differencing algorithm then recursively determines an LCS and divides each sub-stream until a size of the sub-streams is less than a pre-specified size. Byte-level differences are then identified between each of the corresponding sub-streams. Further, the file differencing algorithm defines a protocol for structuring a delta file by using a set of operation codes and a variable length integer format to eliminate redundant information in the delta file. Using this protocol, the file differencing algorithm generates the delta file including an operation array that codes the identified byte-level differences.
-
Citations
28 Claims
-
1. A system for updating electronic files comprising byte-level file differencing and updating (FDU) algorithms, wherein the file differencing algorithm of the FDU algorithms is hosted on a first device, the file differencing algorithm:
-
receiving two byte streams including byte streams corresponding to an original file and a new file, wherein the new file includes updated and revised versions of the original file;
determining a longest common sub-string (LCS) between the two byte streams and divides each of the two byte streams into sub-streams, wherein a first sub-stream is a prefix and a second sub-stream is a suffix to the respective LCS;
recursively determining an LCS and dividing each sub-stream until a size of the sub-streams is less than a pre-specified size and determines a byte-level difference between each of the corresponding sub-streams;
defining a protocol for structuring a delta file by using a set of operation codes and a variable length integer format to eliminate redundant information in the delta file;
generating the delta file including an operation array that codes the determined byte-level differences, wherein generating comprises merging operations of the operation array to reduce an amount of meta-data;
transferring the delta file to a second device via at least one coupling. - View Dependent Claims (2, 3, 4, 5, 6, 7)
-
-
8. A method for performing byte-level file differencing, comprising:
-
identifying a longest common sub-string (LCS) between two received byte streams, wherein the byte streams include a byte stream corresponding to an original file and a byte stream corresponding to a new file version of the original file;
dividing each of the two received byte streams into sub-streams, wherein a first sub- stream is a prefix and a second sub-stream is a suffix to the respective LCS;
recursively determining an LCS and dividing each sub-stream until a size of the sub- streams is less than a pre-specified size or an LCS is absent;
determining a byte-level difference between each of the corresponding sub-streams;
generating an operation array that codes the determined byte-level differences, wherein generating includes merging operations of the operation array to reduce an amount of meta-data; and
writing the operation array to a delta file, wherein writing comprises merging operations of the operation array to reduce an amount of meta-data and using a protocol for structuring the delta file by using a set of operation codes and a variable length integer format to eliminate redundant information in the delta file. - View Dependent Claims (9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22)
-
-
23. A method for generating a difference file between electronic files, comprising:
-
receiving byte streams including an original byte stream and a new byte stream;
determining a size of each of the byte streams;
computing a longest common sub-string (LCS) between the byte streams using a linear- time analysis when each of the byte streams is of a first size;
generating at least one prefix byte stream set from each of the byte streams, wherein each prefix byte stream set includes bytes preceding the LCS;
generating at least one suffix byte stream set from each of the byte streams, wherein each suffix byte stream set includes bytes following the LCS;
recursively determining a size, computing the LCS, and generating at least one prefix and at least one suffix byte stream set for each of the generated prefix and suffix byte stream sets until the size of the generated prefix and suffix byte stream sets is less than a second size;
determining an edit distance between bytes of each of the generated prefix byte stream sets and between bytes of each of the generated suffix byte stream sets, wherein the edit distance is a minimum number of edit operations to transform the original byte stream into the new byte stream; and
generating a delta file comprising an operation array that includes information of the edit distance between the byte streams. - View Dependent Claims (24)
-
-
25. A method for performing byte-level file differencing and updating, comprising:
-
receiving two byte streams in a first system, wherein the two byte streams include byte streams corresponding to both an original file and a new file;
determining a longest common sub-string (LCS) between the two byte streams and dividing each of the two byte streams into sub-streams, wherein a first sub-stream is a prefix and a second sub-stream is a suffix to the respective LCS;
recursively determining an LCS and dividing each sub-stream until a size of the sub-streams is less than a pre-specified size and determining a byte-level difference between each of the corresponding sub-streams;
generating a delta file including an operation array that codes the determined byte-level differences, wherein generating comprises merging at least one operation of the operation array to reduce an amount of meta-data and writing the operation array using a variable length integer format;
transferring the delta file to a second system;
generating a copy of the new file in the second system using the delta file;
updating the original file in the second system using the copy of the new file.
-
-
26. An apparatus including at least one processing device configured to perform byte-level file differencing and updating, comprising:
-
means for receiving two byte streams including a byte stream corresponding to an original file and a byte stream corresponding to a new file;
means for determining a longest common sub-string (LCS) between the two received byte streams;
means for dividing each of the two received byte streams into sub-streams, wherein a first sub-stream is a prefix and a second sub-stream is a suffix to the respective LCS;
means for recursively determining an LCS and dividing each sub-stream until a size of the sub-streams is less than a pre-specified size;
means for determining a byte-level difference between each of the sub-streams, wherein byte-level differences are determined between the original file and the new file;
means for generating an operation array that codes the determined byte-level differences;
means for optimizing the operation array by merging operations to reduce file meta-data; and
means for generating a delta file by writing the optimized operation array using a variable length integer format.
-
-
27. An apparatus including at least one processing device configured to perform byte-level file differencing and updating, comprising:
-
means for receiving a delta file, wherein generation of the delta file comprises, determining a longest common sub-string (LCS) between two byte streams including a byte stream corresponding to an original file and a byte stream corresponding to a new version of the original file, dividing each of the two byte streams into sub-streams, wherein a first sub-stream is a prefix and a second sub-stream is a suffix to the respective LCS, recursively determining an LCS and dividing each sub-stream until a size of the sub-streams is less than a pre-specified size, determining a byte-level difference between each of the corresponding sub- streams, generating the delta file by coding the byte-level differences in an operation array, merging at least one operation of the operation array to reduce an amount of meta-data, and writing the operation array using a variable length integer format;
means for generating a copy of the new file using the delta file; and
means for updating the original file in a second device using the copy.
-
-
28. A computer readable medium including executable instructions which, when executed in a processing system, performs byte-level file differencing and updating by:
-
determining a longest common sub-string (LCS) between two received byte streams, wherein the byte streams include a byte stream corresponding to an original file and a byte stream corresponding to a new file version of the original file;
dividing each of the two received byte streams into sub-streams, wherein a first sub-stream is a prefix and a second sub-stream is a suffix to the respective LCS;
recursively determining an LCS and dividing each sub-stream until a size of the sub-streams is less than a pre-specified size;
determining a byte-level difference between each of the corresponding sub-streams;
generating an operation array that codes the determined byte-level differences, wherein generating includes merging at least one operation of the operation array to reduce an amount of meta-data; and
writing the operation array to a delta file using a variable length integer format.
-
Specification