Method and apparatus for producing a hybrid data structure for displaying a raster image
First Claim
1. An electronic document comprising:
- an image raster representation of a document page, where display of the image raster representation generates a perceptible image of the document page, the image raster representation having a hybrid data structure including;
a non-aided representation of the document page, the non-coded representation comprising bitmap representations of non-identifiable lexical objects within the document page, non-identifiable lexical objects being lexical objects having an assigned recognition confidence level below a threshold level;
a coded representation of the document page, the coded representation comprising codes corresponding to identifiable lexical objects within the document page, identifiable lexical object being lexical objects having an assigned recognition confidence level at or above the threshold level, the recognition confidence level being assigned to the object by a character recognition process; and
linking information associating each code with a corresponding area of the image raster representation for identifying a matched area of the image raster representation corresponding to a code that has been matched to a search word, wherein;
the coded representation comprises a page description language description of a hidden page, the hidden page being a representation of the document page;
each code comprises coded text;
the position of the coded text on the hidden page being the same as the position of the corresponding lexical object on the image raster representation; and
the linking information for each identifiable lexical object comprises the position of the coded text of the lexical object on the hidden page.
0 Assignments
0 Petitions
Accused Products
Abstract
A system for producing a raster image derived from coded and non-coded portions of a hybrid data structure from an input bitmap including (1) a data processing apparatus, (2) a recognizer which performs recognition on an input bitmap to the data processing apparatus to detect identifiable objects within the input bitmap, (3) a mechanism for producing a hybrid data structure including coded data corresponding to the identifiable objects and non-coded data derived from portions of the input bitmap which do not correspond to the identifiable objects, and (4) an output device capable of developing a visually perceptible raster image derived from the hybrid data structure. The raster image includes raster images of the identifiable objects and raster images derived from portions of the input bitmap that do not correspond to the identifiable objects. The invention includes a method for producing a hybrid data structure for a bitmap of an image having the steps of: (1) inputting a signal comprising a bitmap into a digital processing apparatus, (2) partitioning the bitmap into a hierarchy of lexical units, (3) assigning labels to a label list for each lexical unit of a predetermined hierarchical level, where labels in the label list have an associated confidence level, and (4) storing each lexical unit in a hybrid data structure as either an identifiable object or a non-identifiable object.
-
Citations
17 Claims
-
1. An electronic document comprising:
-
an image raster representation of a document page, where display of the image raster representation generates a perceptible image of the document page, the image raster representation having a hybrid data structure including;
a non-aided representation of the document page, the non-coded representation comprising bitmap representations of non-identifiable lexical objects within the document page, non-identifiable lexical objects being lexical objects having an assigned recognition confidence level below a threshold level;
a coded representation of the document page, the coded representation comprising codes corresponding to identifiable lexical objects within the document page, identifiable lexical object being lexical objects having an assigned recognition confidence level at or above the threshold level, the recognition confidence level being assigned to the object by a character recognition process; and
linking information associating each code with a corresponding area of the image raster representation for identifying a matched area of the image raster representation corresponding to a code that has been matched to a search word, wherein;
the coded representation comprises a page description language description of a hidden page, the hidden page being a representation of the document page;
each code comprises coded text;
the position of the coded text on the hidden page being the same as the position of the corresponding lexical object on the image raster representation; and
the linking information for each identifiable lexical object comprises the position of the coded text of the lexical object on the hidden page. - View Dependent Claims (2, 3, 4, 5, 6, 7)
-
-
8. A method for displaying an electronic document, comprising:
-
receiving an input raster representation of a document page;
receiving a non-coded representation of the document generated from the input raster representation, the non-coded representation comprising bitmap representations of lexical objects within the document page;
receiving a coded representation of the document page generated from the input raster representation, the coded representation comprising codes corresponding to identifiable lexical objects within the document page;
receiving linking information associating each code with a corresponding area of the image raster representation for identifying a matched area of the image raster representation corresponding to a code that has been matched to a search word;
putting into effect one of a first display mode and a second display mode; and
displaying the input raster representation of the document page without displaying the coded representation of the document page while the first display mode is in effect, and displaying selected codes over the input raster representation while the second display mode is in effect. - View Dependent Claims (9, 10)
-
-
11. A method for displaying an electronic document, comprising:
-
receiving an input raster representation of a document page;
receiving a non-coded representation of the document generated from the input raster representation, the non-coded representation comprising bitmap representations of lexical objects within the document page;
receiving a coded representation of the document page generated from the input raster representation, the coded representation comprising codes corresponding to identifiable lexical objects within the document page;
receiving linking information associating each code with a corresponding area of the image raster representation for identifying a matched area of the image raster representation corresponding to a code that has been matched to a search word;
putting into effect a first display mode or a second display mode;
while the first display mode is in effect, displaying the input raster representation of the document page without displaying the coded representation of the document page and highlighting the matched area of the input raster representation in the displayed input raster representation; and
while the second display mode is in effect, displaying selected codes rendered as images in place of corresponding areas of the input raster representation, and highlighting the matched area in the display. - View Dependent Claims (12, 13)
-
-
14. A method for generating an electronic document, comprising:
-
receiving an input raster representation of a document page;
performing a character recognition process on the input bitmap to identify lexical objects within the document page;
assigning a confidence level to each identified lexical object;
generating a non-coded representation of the document generated from the input raster representation, the non-coded representation comprising bitmap representations of lexical objects within the document page having an assigned confidence level below a threshold level;
generating a coded representation of the document page generated from the input raster representation, the coded representation comprising codes corresponding to lexical objects within the document page having an assigned confidence level at or above the threshold level; and
generating linking information associating each code with a corresponding area of the image raster representation for identifying a matched area of the image raster representation corresponding to a code that has been matched to a search word.
-
-
15. A system for generating an electronic document, comprising:
-
means for receiving an input raster representation of a document page;
means for performing a character recognition process on the input bitmap to identify lexical objects within the document page;
means for assigning a confidence level to each identified lexical object;
means for generating a non-coded representation of the document generated from the input raster representation, the non-coded representation comprising bitmap representations of lexical objects within the document page having an assigned confidence level below a threshold level;
means for generating a coded representation of the document page generated from the input raster representation, the coded representation comprising codes corresponding to identifiable lexical objects within the document page having an assigned confidence level at or above the threshold level; and
means for generating linking information associating each code with a corresponding area of the image raster representation for identifying a matched area of the image raster representation corresponding to a code that has been matched to a search word.
-
-
16. A computer program product, tangibly stored on a computer-readable medium, for displaying an electronic document, the product comprising instructions operable to cause a programmable process to:
-
receive an input raster representation of a document page;
receive a non-coded representation of the document generated from the input raster representation, the non-coded representation comprising bitmap representations of unidentifiable lexical objects within the document page;
receive a coded representation of the document page generated from the input raster representation, the coded representation comprising codes corresponding to identifiable lexical objects within the document page;
receive linking information associating each code with a corresponding area of the image raster representation for identifying a matched area of the image raster representation corresponding to a code that has been matched to a search word;
put into effect a first display mode and a second display mode;
display the input raster representation of the document page without displaying the coded representation of the document page while a first display mode is in effect, and display selected codes over the input raster representation while a different display mode is in effect.
-
-
17. An apparatus for producing a hybrid data structure from an input raster image which has been scanned and converted to an input bitmap, the hybrid data structure including coded portions which represent all of the identifiable objects contained within a first part of the input bitmap and a non-coded second part of the input bitmap representing non-identifiable objects, the coded portions themselves being capable of conversion to bitmap representations of the identifiable objects without access to external data, said apparatus comprising:
-
means for performing a recognition process on the input bitmap;
means for representing the identifiable objects as coded data; and
means for creating the hybrid data structure that may be used to reproduce the input raster image, wherein the hybrid data structure has two parts, a first part corresponding to the first part of the input bitmap that incorporates solely the coded portions, and a second part corresponding to the non-coded second part of the input bitmap that incorporates solely the non-coded second part of the input bitmap, and wherein the hybrid data structure, without access to external data, is sufficient to reproduce the entire input raster image, wherein the means for creating the hybrid data structure further includes means for assigning a recognition confidence level to each object such that the identifiable objects are objects having a confidence level at or above a threshold level, and the non-identifiable objects are objects having a confidence level below a threshold level.
-
Specification