METHOD AND APPARATUS FOR DOCUMENT CONVERSION
First Claim
1. A method of converting an input document having a fixed-layout into a portable format, the method comprising, for each page of the input document:
- extracting a font used on the page;
adjusting metrics of the converted font;
converting the extracted font into a format compatible with a web browser program;
extracting text from the page;
rendering content of the page, other than the font and the text, as one or more content images;
storing the converted font, the extracted text and the one or more content images;
combining the converted font, the extracted text and the one or more content images as a corresponding page of an output document formatted according to a markup language compatible with the web browser program;
comparing an image of the page of the input document to an image of the corresponding page of the output document to generate an error score; and
if the error score exceeds a threshold, replacing the corresponding page of the output document with an image of the page of the input document.
2 Assignments
0 Petitions
Accused Products
Abstract
Method and apparatus for converting a document from a fixed-layout format (e.g., Microsoft Office, Adobe PDF) into a non-fixed layout format (e.g., HTML) portable to different platforms (e.g., desktop computers, tablet computer, smart phones) operating different operating systems (e.g., Microsoft Windows, Apple OS X) and different web browsers (e.g., Microsoft Internet Explorer, Apple Safari, Mozilla FireFox). In one stream, fonts are identified, extracted, and processed to enhance compatibility with the portable format. In another stream, textual content is extracted and processed to enhance compatibility and images are taken of non-textual content. These images are used as backgrounds in the output document, over which the textual content is rendered in the appropriate fonts, with sizing, spacing, positioning and/or other characteristics matching or closely approximating that of the original document. Error detection is applied by comparing images of the original document to corresponding images of the output document, to ensure high fidelity.
47 Citations
19 Claims
-
1. A method of converting an input document having a fixed-layout into a portable format, the method comprising, for each page of the input document:
-
extracting a font used on the page; adjusting metrics of the converted font; converting the extracted font into a format compatible with a web browser program; extracting text from the page; rendering content of the page, other than the font and the text, as one or more content images; storing the converted font, the extracted text and the one or more content images; combining the converted font, the extracted text and the one or more content images as a corresponding page of an output document formatted according to a markup language compatible with the web browser program; comparing an image of the page of the input document to an image of the corresponding page of the output document to generate an error score; and if the error score exceeds a threshold, replacing the corresponding page of the output document with an image of the page of the input document. - View Dependent Claims (2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17)
-
-
18. A computer-readable medium storing instructions that, when executed by a processor, cause the processor to perform a method of converting an input document having a fixed-layout into a portable format, the method comprising, for each page of the input document:
-
extracting a font used on the page; adjusting metrics of the converted font; converting the extracted font into a format compatible with a web browser program; extracting text from the page; rendering content of the page, other than the font and the text, as one or more content images; storing the converted font, the extracted text and the one or more content images; combining the converted font, the extracted text and the one or more content images as a corresponding page of an output document formatted according to a markup language compatible with the web browser program; visually comparing an image of the page of the input document to an image of the corresponding page of the output document to generate an error score; and if the error score exceeds a threshold, replacing the corresponding page of the output document with an image of the page of the input document.
-
-
19. An apparatus for converting a fixed-layout input document into a non-fixed format output document, comprising:
-
a display device for displaying documents; a processor; font-processing logic for; identifying a font used in the input document; and adjusting the font for use in the output document; content-processing logic for; copying textual content of the input document; and adjusting the textual content; image logic for; capturing images of full pages of a document; and capturing images of non-textual content of pages of the input document; and error detection logic for; comparing a page of the output document to a corresponding page of the input document; and discarding the page of the output document if it differs from the corresponding page of the input document by more than a threshold.
-
Specification