Comparing contents of electronic documents
First Claim
1. A method executed in a computer system for comparing electronic documents on a page-by-page basis, the method comprising:
- storing in a hash table a hash value of page attributes of a first document;
using the hash value of a page of the second document to determine whether there is a match of the hash value in the hash table;
pairing the page of the second document with the page of the first document that has the hash value in the hash table;
rendering a bitmap of each of the still unpaired pages of the first and second documents;
storing in a hash table a hash value of the bitmap of each of the still unpaired pages of the first document;
forming hash values of the bitmap of each of the unpaired pages of the second document;
pairing the page of the second document with the page of the first document that has the hash value of the bitmap in the hash table;
storing in a hash table a hash value of a subset of the bitmap of each of the still unpaired pages of the first document;
forming hash values of a subset of the bitmap of each of the still unpaired pages of the second document;
pairing the page of the second document with the page of the first document that has the has value of the subset of the bitmap in the hash table;
pairing a still unpaired page in the first document which immediately follows one page of a page pair, with a still unpaired page in the second document which immediately follows the other page of the page pair;
pairing a still unpaired page in one document with a blank page in the other document if the unpaired page in the one document immediately following one page of a page pair and if the page in the other document which immediately follows the other page of the page pair is paired; and
highlighting differences between the pages that do not match, on a visual rendering of the pages.
1 Assignment
0 Petitions
Accused Products
Abstract
A method is described which compares contents-rich documents page by page and creates a difference document of paired pages. The pages are compared, in that order, based on their marking operators, on bitmaps rendered from the still unpaired pages, and on a subset of the bitmap, e.g. in smaller page areas. Pages that are visually identical are paired. Blank pages are inserted if pages cannot be paired to deal with page insertions and deletions. Differences between pages which can be visible in a printed document, are marked on the paired pages. The method can be used with documents that contain embedded graphical contents as well as with plain text files.
111 Citations
5 Claims
-
1. A method executed in a computer system for comparing electronic documents on a page-by-page basis, the method comprising:
-
storing in a hash table a hash value of page attributes of a first document;
using the hash value of a page of the second document to determine whether there is a match of the hash value in the hash table;
pairing the page of the second document with the page of the first document that has the hash value in the hash table;
rendering a bitmap of each of the still unpaired pages of the first and second documents;
storing in a hash table a hash value of the bitmap of each of the still unpaired pages of the first document;
forming hash values of the bitmap of each of the unpaired pages of the second document;
pairing the page of the second document with the page of the first document that has the hash value of the bitmap in the hash table;
storing in a hash table a hash value of a subset of the bitmap of each of the still unpaired pages of the first document;
forming hash values of a subset of the bitmap of each of the still unpaired pages of the second document;
pairing the page of the second document with the page of the first document that has the has value of the subset of the bitmap in the hash table;
pairing a still unpaired page in the first document which immediately follows one page of a page pair, with a still unpaired page in the second document which immediately follows the other page of the page pair;
pairing a still unpaired page in one document with a blank page in the other document if the unpaired page in the one document immediately following one page of a page pair and if the page in the other document which immediately follows the other page of the page pair is paired; and
highlighting differences between the pages that do not match, on a visual rendering of the pages.
-
-
2. A computer program product for comparing electronic documents, the product residing on a computer-readable medium and comprising instructions for causing a computer to:
-
store in a hash table a hash value of page attributes of a first document, wherein the page attribute is computed of marking operators of a page;
form hash values of page attributes of a second document;
use the hash value of a page of the second document to determine whether there is a match of the hash value in the hash table;
pair the page of the second document with a page of the first document that has the hash value in the hash table;
render a bitmap of the still unpaired pages;
store in the hash table a hash of the bitmap of each of the still unpaired pages of the first document;
form bitmaps of a selected portion of each of the pages;
store in a hash table of page attributes the hash of the bitmap of the selected portion the page for each of the unmatched pages of the first document;
use hash value of page attributes of the bitmap of the selected portion the page of the second document to determine if the hash value of a page of the second document matches a hash value in the hash table; and
pair a still unpaired page in the first document that immediately follows one page of a page pair, with a still unpaired page in the second document that immediately follows the other page of the page pair.
-
-
3. A method executed in a computer system for comparing electronic documents on a page-by-page basis, the method comprising:
-
computing a first digest of marking operators of each page of a first and a second document and pairing the pages of the first document with the pages of the second document that have identical first digests;
computing a second digest of a rendered bitmap of each page of the first document that is still unpaired, and of each page of the second document, that is still unpaired, and pairing the still unpaired of the first document with the still unpaired pages of the second document that have identical second digests;
computing a third digest of a subset of the rendered bitmap of each page of the first document that is still unpaired, and of each page of the second document that is still unpaired, and pairing the still unpaired pages of the first document with the still unpaired pages of the second document that have identical third digests;
pairing an unpaired page in the first document which immediately follows a paired page in the first document, with the page in the second document which immediately follows the other of the paired pages in the second document, if the page which immediately follows the other of the paired pages in the second document is also still unpaired;
pairing any still unpaired pages in the first and second document with a blank page inserted in the second and first document; and
highlighting differences between paired pages that do not have identical first digest, on a visual rendering of the paired pages.
-
-
4. A method executed in a computer system for comparing electronic documents on a page-by-page basis, the method comprising:
-
computing a first digest of marking operators of each page of a first and a second document and pairing the pages of the first document with the pages of the second document that have identical first digests;
computing a second digest of a rendered bitmap of each page of the first document that is still unpaired, and of each page of the second document, that is still unpaired, and pairing the still unpaired of the first document with the still unpaired pages of the second document that have identical second digests;
computing a third digest of a subset of the rendered bitmap of each page of the first document that is still unpaired, and of each page of the second document that is still unpaired, and pairing the still unpaired pages of the first document with the still unpaired pages of the second document that have identical third digests;
pairing an unpaired page in the first document which immediately follows a paired page in the first document, with the page in the second document which immediately follows the other of the paired pages in the second document, if the page which immediately follows the other of the paired pages in the second document is also still unpaired;
pairing any still unpaired pages in the first and second document with a blank page inserted in the second and first document; and
highlighting differences between paired pages that do not have identical first digest, on a visual rendering of the paired pages; and
arranging the paired pages in a difference document based on an original page sequence in one of the documents.
-
-
5. A method executed in a computer system for comparing electronic documents on a page-by-page basis, the method comprising:
-
computing a first digest of marking operators of each page of a first and a second document and pairing the pages of the first document with the pages of the second document that have identical first digests;
computing a second digest of a rendered bitmap of each page of the first document that is still unpaired, and of each page of the second document, that is still unpaired, and pairing the still unpaired of the first document with the still unpaired pages of the second document that have identical second digests;
computing a third digest of a subset of the rendered bitmap of each page of the first document that is still unpaired, and of each page of the second document that is still unpaired, and pairing the still unpaired pages of the first document with the still unpaired pages of the second document that have identical third digests;
pairing an unpaired page in the first document which immediately follows a paired page in the first document, with the page in the second document which immediately follows the other of the paired pages in the second document, if the page which immediately follows the other of the paired pages in the second document is also still unpaired;
pairing any still unpaired pages in the first and second document with a blank page inserted in the second and first document; and
highlighting differences between paired pages that do not have identical first digest, on a visual rendering of the paired pages; and
arranging in a difference document paired pages that differ from each other with respect to at least one of the first, second and third digests.
-
Specification