File structure for scanned documents
First Claim
1. A method for producing a file structure for representing a scanned image of at least a portion of a physical document, comprising:
- receiving a resolution dependent bitmap image of a physical document, said image being produced by an optical scanning device including a plurality of bitmapped features, said plurality of bitmapped features in said image having no initial plain text identities;
locating said plurality of bitmapped features in said image and inputting said plurality of bitmapped features into a text recognition system which obtains output plain text values for a subset of the bitmapped features in said plurality of bitmapped features, where said output plain text values may be single character codes or strings of character codes;
classifying as non-textual those bitmapped features in the plurality of bitmapped features that are not members of said subset for which plain text values were obtained, and as textual those bitmapped features which are members of said subset for which plain text values were obtained from said recognition system;
using said classifications to group textual bitmapped features into textual records, one textual record per textual bitmapped feature, and each textual record listing at least the following items;
the output plain text value as provided by said textual recognition system, the spatial location of the bitmapped feature in said image, and a bitmap of the bitmapped feature;
thereby making the image searchable by enabling the comparison of plain text, as provided by a query search engine, to be compared with plain text values in said textual records, thereby locating any textual bitmaps in the image that match the query plain text;
grouping non-textual bitmapped features into non-textual records, each non-textual record listing at least the following items;
the spatial location in the bitmapped feature in said image, and a bitmap of the bitmapped feature;
generating a file comprising said textual and non-textual records so as to represent the image and a plain text interpretation of any textual bitmaps therein.
3 Assignments
0 Petitions
Accused Products
Abstract
The present invention provides an electronic file and file structure solution for comprehensive management of documents captured as scanned objects, raster objects or representation. Using the present invention a representation of a document is created using any type of imaging device. The representation includes objects present in the document. The location in the document of the objects in the plurality of objects is identified. One copy of each different object in the plurality of objects is stored in the file. The location of objects in the plurality of objects are stored in the file in a spatial layout index. The file thus contains all of the information required to faithfully reproduce the original document. In order to reconstruct the document, the objects are placed at the locations identified by the spatial layout index.
91 Citations
12 Claims
-
1. A method for producing a file structure for representing a scanned image of at least a portion of a physical document, comprising:
-
receiving a resolution dependent bitmap image of a physical document, said image being produced by an optical scanning device including a plurality of bitmapped features, said plurality of bitmapped features in said image having no initial plain text identities;
locating said plurality of bitmapped features in said image and inputting said plurality of bitmapped features into a text recognition system which obtains output plain text values for a subset of the bitmapped features in said plurality of bitmapped features, where said output plain text values may be single character codes or strings of character codes;
classifying as non-textual those bitmapped features in the plurality of bitmapped features that are not members of said subset for which plain text values were obtained, and as textual those bitmapped features which are members of said subset for which plain text values were obtained from said recognition system;
using said classifications to group textual bitmapped features into textual records, one textual record per textual bitmapped feature, and each textual record listing at least the following items;
the output plain text value as provided by said textual recognition system, the spatial location of the bitmapped feature in said image, and a bitmap of the bitmapped feature;
thereby making the image searchable by enabling the comparison of plain text, as provided by a query search engine, to be compared with plain text values in said textual records, thereby locating any textual bitmaps in the image that match the query plain text;
grouping non-textual bitmapped features into non-textual records, each non-textual record listing at least the following items;
the spatial location in the bitmapped feature in said image, and a bitmap of the bitmapped feature;
generating a file comprising said textual and non-textual records so as to represent the image and a plain text interpretation of any textual bitmaps therein. - View Dependent Claims (2, 3, 4, 5, 6)
comparing two or more said bitmaps, in one or more said textual and non-textual records in said generated file, for optically similar shape properties, and if sufficiently similar then any two or more said bitmaps are replaced by a single bitmap in said records.
-
-
5. The method of claim 3, including prior to outputting said file:
comparing two or more said bitmaps, in one or more said textual and non-textual records in said generated file, for optically similar shape properties, and if sufficiently similar then any two or more said bitmaps are replaced by a single bitmap in said records.
-
6. The method of claim 3, including producing an indexed structure to the textual records, said index using the plain text values in said textual records as keys to locating said textual records.
-
7. A file structure produced according to a method for producing said file structure for representing a scanned image of at least a portion of a physical document, comprising:
-
receiving a resolution dependent bitmap image of a physical document, said image being produced by an optical scanning device including a plurality of bitmapped features, said plurality of bitmapped features in said image having no initial plain text identities;
locating said plurality of bitmapped features in said image and inputting said plurality of bitmapped features into a text recognition system which obtains output plain text values for a subset of the bitmapped features in said plurality of bitmapped features, where said output plain text values may be single character codes or strings of character codes;
classifying as non-textual those bitmapped features in the plurality of bitmapped features that are not members of said subset for which plain text values were obtained, and as textual those bitmapped features which are members of said subset for which plain text values were obtained from said recognition system;
using said classifications to group textual bitmapped features into textual records, one textual record per textual bitmapped feature, and each textual record listing at least the following items;
the output plain text value as provided by said textual recognition system, the spatial location of the bitmapped feature in said image, and a bitmap of the bitmapped feature;
thereby making the image searchable by enabling the comparison of plain text, as provided by a query search engine, to be compared with plain text values in said textual records, thereby locating any textual bitmaps in the image that match the query plain text;
grouping non-textual bitmapped features into non-textual records, each non-textual record listing at least the following items;
the spatial location in the bitmapped feature in said image, and a bitmap of the bitmapped feature;
generating a file comprising said textual and non-textual records so as to represent the image and a plain text interpretation of any textual bitmaps therein. - View Dependent Claims (8, 9, 10, 11, 12)
comparing two or more said bitmaps, in one or more said textual and non-textual records in said generated file, for optically similar shape properties, and if sufficiently similar then any two or more said bitmaps are replaced by a single bitmap in said records.
-
-
11. The file structure of claim 9, said method of producing including prior to outputting said file:
comparing two or more said bitmaps, in one or more said textual and non-textual records in said generated file, for optically similar shape properties, and if sufficiently similar then any two or more said bitmaps are replaced by a single bitmap in said records.
-
12. The file structure of claim 10, said method of producing including producing an indexed structure to the textual records, said index using the plain text values in said textual records as keys to locating said textual records.
Specification