Automated capture of technical documents for electronic review and distribution
First Claim
1. A method of automatically coding, managing, and displaying a document in digital form, the method comprising the steps of:
- scanning a document into an image format suitable for display purposes;
embedding the image format into a hypertext-based meta-language format including one or more hypertext links;
segmenting the hypertext-based document into one or more structured blocks;
decoding a particular block into text, images, and tables, as appropriate, in accordance with a block-specific decoding strategy; and
embedding the text derived from the block decoding into a conventional document format, enabling the use of a text-based search method.
4 Assignments
0 Petitions
Accused Products
Abstract
Paper documents are automatically converted into a hypertext-based format so that they can be accessed through electronic networks, including the Internet, or via non-volatile transfer media such as disks or CD-ROMs. The invention generalizes the concept of form-based recognition while extending the concept of document retrieval to include document structure knowledge, thereby providing the advantages found in both form-based recognition (utilization of document structure knowledge) and image-based information retrieval (robustness). In a preferred embodiment, a method according to the invention enables direct translation of a paper document into a hypertext-based format so that it may be directly accessed through the Internet using current browsers such as Mosaic, Netscape and Microsoft'"'"'s Explorer.
288 Citations
18 Claims
-
1. A method of automatically coding, managing, and displaying a document in digital form, the method comprising the steps of:
-
scanning a document into an image format suitable for display purposes; embedding the image format into a hypertext-based meta-language format including one or more hypertext links; segmenting the hypertext-based document into one or more structured blocks; decoding a particular block into text, images, and tables, as appropriate, in accordance with a block-specific decoding strategy; and embedding the text derived from the block decoding into a conventional document format, enabling the use of a text-based search method. - View Dependent Claims (2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12)
-
-
13. A method of digital document encoding and management, comprising the steps of:
-
scanning a document into one or more page images; embedding at least one of the page images into a hypertext-based meta-language format which enables a user to automatically or manually segment the page images into document structure blocks; decoding a particular block into text plus images and tables, as appropriate, in accordance with a block-specific decoding strategy, the text including one or more non-proofread (dirty) sections; and embedding the text, including the dirty sections, into a conventional document format, enabling the use of a text-based search method. - View Dependent Claims (14, 15, 16, 17, 18)
-
Specification