Methods and systems for differencing orderly dependent files
First Claim
1. A method comprising:
- receiving a reference file and a target file that are orderly dependent of each other, the reference and target files having common blocks that appear in a same order in both the reference and target files;
generating difference data between the reference and target files;
generating a difference file comprising the difference data;
receiving the difference file in a first computer over a computer network; and
in the first computer, reconstructing the target file using the difference data from the difference file and a copy of the reference file stored in the first computer,wherein generating the difference data between the reference and target files comprises;
dividing the reference file into a plurality of chunks;
loading a first chunk and a second chunk of the reference file into a main memory of a second computer;
loading a part of the target file into the main memory of the second computer;
creating a rolling hash table of the first chunk and a rolling hash table of the second chunk of the reference file, the rolling hash table of the first chunk and the rolling hash table of the second chunk being separate rolling hash tables; and
identifying substrings that are common to both the reference and target files by comparing the rolling hash table of the first chunk of the reference file to a hash of the part of the target file.
1 Assignment
0 Petitions
Accused Products
Abstract
Difference data is generated between a reference file and a target file that are orderly dependent having common blocks that appear in the same order in both the reference and target files. The difference data is generated by comparing hash values of chunks of the reference file against hash values of parts of the target file to identity copy operations between the reference and target files. Chunks of the reference file and parts of the target file are loaded into main memory to create hashes for comparison and unloaded from the main memory after exhaustion. The difference data is included in a difference file, which is provided to one or more endpoint computers. In an endpoint computer, the target file is reconstructed using a copy of the reference file and the difference data from the difference file.
43 Citations
18 Claims
-
1. A method comprising:
-
receiving a reference file and a target file that are orderly dependent of each other, the reference and target files having common blocks that appear in a same order in both the reference and target files; generating difference data between the reference and target files; generating a difference file comprising the difference data; receiving the difference file in a first computer over a computer network; and in the first computer, reconstructing the target file using the difference data from the difference file and a copy of the reference file stored in the first computer, wherein generating the difference data between the reference and target files comprises; dividing the reference file into a plurality of chunks; loading a first chunk and a second chunk of the reference file into a main memory of a second computer; loading a part of the target file into the main memory of the second computer; creating a rolling hash table of the first chunk and a rolling hash table of the second chunk of the reference file, the rolling hash table of the first chunk and the rolling hash table of the second chunk being separate rolling hash tables; and identifying substrings that are common to both the reference and target files by comparing the rolling hash table of the first chunk of the reference file to a hash of the part of the target file. - View Dependent Claims (2, 3, 4, 5, 6, 7, 8)
-
-
9. A system comprising:
-
a computer system that receives a reference file and a target file that are orderly dependent having common blocks that appear in a same order in both the reference and target files, generates a difference data for the reference and target files, formats the difference data into a difference file, and transmits the difference file to a plurality of endpoint computers; and an endpoint computer in the plurality of endpoint computers that receives the difference file, obtains the difference data from the difference file, and reconstructs the target file in the endpoint computer using a copy of the reference file and the difference data, wherein the computer system generates the difference data by dividing the reference file into a plurality of chunks, loading a first chunk of the reference file into a main memory of the computer system, loading a second chunk of the reference file into the main memory of the computer system, loading a part of the target file into the main memory of the computer system, creating a rolling hash table of the first chunk of the reference file, creating a rolling hash table of the second chunk of the reference file that is separate from the rolling hash table of the first chunk of the reference file, and comparing the rolling hash table of the first chunk of the reference file to a hash of the part of the target file to identify substrings that are common to both the reference and target files. - View Dependent Claims (10, 11, 12, 13)
-
-
14. A method comprising:
-
receiving a reference file and a target file that are determined to be orderly dependent of each other, the reference and target files having common blocks that appear in a same order in both the reference and target files; dividing the reference file into a plurality of chunks; loading a first a chunk in the plurality of chunks, a second chunk in the plurality of chunks, and a part of the target file into a main memory of a computer system; creating a rolling hash table of the first chunk; creating a rolling hash table of the second chunk that is separate from the rolling hash table of the first chunk; comparing the rolling hash table of the first chunk and a hash of the first part of the target file to identify a copy operation for reconstructing the target file in an endpoint computer using a copy of the reference file and a listing of copy operations; including the listing of copy operations in a difference file; and providing the difference file to the endpoint computer. - View Dependent Claims (15, 16, 17, 18)
-
Specification