Method and apparatus for producing a hybrid data structure for displaying a raster image

US 6,661,919 B2
Filed: 01/25/2002
Issued: 12/09/2003
Est. Priority Date: 08/31/1994
Status: Expired due to Fees

First Claim

Patent Images

1. An electronic document comprising:

an image raster representation of a document page, where display of the image raster representation generates a perceptible image of the document page, the image raster representation having a hybrid data structure including;

a non-aided representation of the document page, the non-coded representation comprising bitmap representations of non-identifiable lexical objects within the document page, non-identifiable lexical objects being lexical objects having an assigned recognition confidence level below a threshold level;

a coded representation of the document page, the coded representation comprising codes corresponding to identifiable lexical objects within the document page, identifiable lexical object being lexical objects having an assigned recognition confidence level at or above the threshold level, the recognition confidence level being assigned to the object by a character recognition process; and

linking information associating each code with a corresponding area of the image raster representation for identifying a matched area of the image raster representation corresponding to a code that has been matched to a search word, wherein;

the coded representation comprises a page description language description of a hidden page, the hidden page being a representation of the document page;

each code comprises coded text;

the position of the coded text on the hidden page being the same as the position of the corresponding lexical object on the image raster representation; and

the linking information for each identifiable lexical object comprises the position of the coded text of the lexical object on the hidden page.

View all claims

0 Assignments

Timeline View

Assignment View

0 Petitions

Accused Products

Abstract

A system for producing a raster image derived from coded and non-coded portions of a hybrid data structure from an input bitmap including (1) a data processing apparatus, (2) a recognizer which performs recognition on an input bitmap to the data processing apparatus to detect identifiable objects within the input bitmap, (3) a mechanism for producing a hybrid data structure including coded data corresponding to the identifiable objects and non-coded data derived from portions of the input bitmap which do not correspond to the identifiable objects, and (4) an output device capable of developing a visually perceptible raster image derived from the hybrid data structure. The raster image includes raster images of the identifiable objects and raster images derived from portions of the input bitmap that do not correspond to the identifiable objects. The invention includes a method for producing a hybrid data structure for a bitmap of an image having the steps of: (1) inputting a signal comprising a bitmap into a digital processing apparatus, (2) partitioning the bitmap into a hierarchy of lexical units, (3) assigning labels to a label list for each lexical unit of a predetermined hierarchical level, where labels in the label list have an associated confidence level, and (4) storing each lexical unit in a hybrid data structure as either an identifiable object or a non-identifiable object.

Citations

17 Claims

1. An electronic document comprising:
- an image raster representation of a document page, where display of the image raster representation generates a perceptible image of the document page, the image raster representation having a hybrid data structure including;
  
  a non-aided representation of the document page, the non-coded representation comprising bitmap representations of non-identifiable lexical objects within the document page, non-identifiable lexical objects being lexical objects having an assigned recognition confidence level below a threshold level;
  
  a coded representation of the document page, the coded representation comprising codes corresponding to identifiable lexical objects within the document page, identifiable lexical object being lexical objects having an assigned recognition confidence level at or above the threshold level, the recognition confidence level being assigned to the object by a character recognition process; and
  
  linking information associating each code with a corresponding area of the image raster representation for identifying a matched area of the image raster representation corresponding to a code that has been matched to a search word, wherein;
  
  the coded representation comprises a page description language description of a hidden page, the hidden page being a representation of the document page;
  
  each code comprises coded text;
  
  the position of the coded text on the hidden page being the same as the position of the corresponding lexical object on the image raster representation; and
  
  the linking information for each identifiable lexical object comprises the position of the coded text of the lexical object on the hidden page.
- View Dependent Claims (2, 3, 4, 5, 6, 7)
- - 2. The electronic document of claim 1, wherein the linking information for each identifiable lexical object comprises a bounding box of the coded text of the lexical object on the hidden page.
  - 3. The electronic document of claim 1, wherein the matched area is displayed as a highlighted area of the input raster representation.
  - 4. The electronic document of claim 1, wherein the codes include coded text in Postscript format.
  - 5. The electronic document of claim 1, wherein the codes include ASCII character codes.
  - 6. The electronic document of claim 1, wherein the image raster representation of the document page is a PDF format.
  - 7. The electronic document of claim 1, wherein the image raster representation of the document page may be displayed in a first display mode or a second display mode, the first display mode displaying the image raster representation of the document page without displaying the coded representation of the document page, and the second display mode displaying selected codes over the non-coded representation of the document page.

8. A method for displaying an electronic document, comprising:
- receiving an input raster representation of a document page;
  
  receiving a non-coded representation of the document generated from the input raster representation, the non-coded representation comprising bitmap representations of lexical objects within the document page;
  
  receiving a coded representation of the document page generated from the input raster representation, the coded representation comprising codes corresponding to identifiable lexical objects within the document page;
  
  receiving linking information associating each code with a corresponding area of the image raster representation for identifying a matched area of the image raster representation corresponding to a code that has been matched to a search word;
  
  putting into effect one of a first display mode and a second display mode; and
  
  displaying the input raster representation of the document page without displaying the coded representation of the document page while the first display mode is in effect, and displaying selected codes over the input raster representation while the second display mode is in effect.
- View Dependent Claims (9, 10)
- - 9. The method of claim 8, wherein displaying the electronic document comprises displaying the document page on a printing device.
  - 10. The method of claim 8, wherein displaying the electronic document comprises displaying the document page on a display screen.

11. A method for displaying an electronic document, comprising:
- receiving an input raster representation of a document page;
  
  receiving a non-coded representation of the document generated from the input raster representation, the non-coded representation comprising bitmap representations of lexical objects within the document page;
  
  receiving a coded representation of the document page generated from the input raster representation, the coded representation comprising codes corresponding to identifiable lexical objects within the document page;
  
  receiving linking information associating each code with a corresponding area of the image raster representation for identifying a matched area of the image raster representation corresponding to a code that has been matched to a search word;
  
  putting into effect a first display mode or a second display mode;
  
  while the first display mode is in effect, displaying the input raster representation of the document page without displaying the coded representation of the document page and highlighting the matched area of the input raster representation in the displayed input raster representation; and
  
  while the second display mode is in effect, displaying selected codes rendered as images in place of corresponding areas of the input raster representation, and highlighting the matched area in the display.
- View Dependent Claims (12, 13)
- - 12. The method of claim 11, wherein displaying the electronic document comprises displaying the document page on a printing device.
  - 13. The method of claim 11, wherein displaying the electronic document comprises displaying the document page on a display screen.

14. A method for generating an electronic document, comprising:
- receiving an input raster representation of a document page;
  
  performing a character recognition process on the input bitmap to identify lexical objects within the document page;
  
  assigning a confidence level to each identified lexical object;
  
  generating a non-coded representation of the document generated from the input raster representation, the non-coded representation comprising bitmap representations of lexical objects within the document page having an assigned confidence level below a threshold level;
  
  generating a coded representation of the document page generated from the input raster representation, the coded representation comprising codes corresponding to lexical objects within the document page having an assigned confidence level at or above the threshold level; and
  
  generating linking information associating each code with a corresponding area of the image raster representation for identifying a matched area of the image raster representation corresponding to a code that has been matched to a search word.

15. A system for generating an electronic document, comprising:
- means for receiving an input raster representation of a document page;
  
  means for performing a character recognition process on the input bitmap to identify lexical objects within the document page;
  
  means for assigning a confidence level to each identified lexical object;
  
  means for generating a non-coded representation of the document generated from the input raster representation, the non-coded representation comprising bitmap representations of lexical objects within the document page having an assigned confidence level below a threshold level;
  
  means for generating a coded representation of the document page generated from the input raster representation, the coded representation comprising codes corresponding to identifiable lexical objects within the document page having an assigned confidence level at or above the threshold level; and
  
  means for generating linking information associating each code with a corresponding area of the image raster representation for identifying a matched area of the image raster representation corresponding to a code that has been matched to a search word.

16. A computer program product, tangibly stored on a computer-readable medium, for displaying an electronic document, the product comprising instructions operable to cause a programmable process to:
- receive an input raster representation of a document page;
  
  receive a non-coded representation of the document generated from the input raster representation, the non-coded representation comprising bitmap representations of unidentifiable lexical objects within the document page;
  
  receive a coded representation of the document page generated from the input raster representation, the coded representation comprising codes corresponding to identifiable lexical objects within the document page;
  
  receive linking information associating each code with a corresponding area of the image raster representation for identifying a matched area of the image raster representation corresponding to a code that has been matched to a search word;
  
  put into effect a first display mode and a second display mode;
  
  display the input raster representation of the document page without displaying the coded representation of the document page while a first display mode is in effect, and display selected codes over the input raster representation while a different display mode is in effect.

17. An apparatus for producing a hybrid data structure from an input raster image which has been scanned and converted to an input bitmap, the hybrid data structure including coded portions which represent all of the identifiable objects contained within a first part of the input bitmap and a non-coded second part of the input bitmap representing non-identifiable objects, the coded portions themselves being capable of conversion to bitmap representations of the identifiable objects without access to external data, said apparatus comprising:
- means for performing a recognition process on the input bitmap;
  
  means for representing the identifiable objects as coded data; and
  
  means for creating the hybrid data structure that may be used to reproduce the input raster image, wherein the hybrid data structure has two parts, a first part corresponding to the first part of the input bitmap that incorporates solely the coded portions, and a second part corresponding to the non-coded second part of the input bitmap that incorporates solely the non-coded second part of the input bitmap, and wherein the hybrid data structure, without access to external data, is sufficient to reproduce the entire input raster image, wherein the means for creating the hybrid data structure further includes means for assigning a recognition confidence level to each object such that the identifiable objects are objects having a confidence level at or above a threshold level, and the non-identifiable objects are objects having a confidence level below a threshold level.

Specification

Resources

Litigation Campaign Assessment

Current Assignee
Adobe Systems Incorporated (Adobe Inc.)
Original Assignee
Adobe Systems Incorporated (Adobe Inc.)
Inventors
Nicholson, Dennis G., King, James C.
Primary Examiner(s)
Johns, Andrew W.
Assistant Examiner(s)
Alavi, Amir

Application Number

US10/054,944
Publication Number

US 20020067859A1
Time in Patent Office

683 Days
Field of Search

382/171, 382/173, 382/177, 382/180, 382/181, 382/217, 382/258, 382/276, 707/509, 707/510, 707/514, 707/516, 707/526, 707/542
US Class Current

382/173
CPC Class Codes

G06V 30/10   Character recognition

G06V 30/127   with the intervention of an...

G06V 30/268   Lexical context

G06V 30/40   Document-oriented image-bas...

H04N 1/4115   involving the recognition o...

Method and apparatus for producing a hybrid data structure for displaying a raster image

First Claim

0 Assignments

0 Petitions

Accused Products

Abstract

Citations

17 Claims

Specification

Solutions

Use Cases

Quick Links

Method and apparatus for producing a hybrid data structure for displaying a raster image

First Claim

0 Assignments

Subscription Required

Subscription Required

0 Petitions

Subscription Required

Accused Products

Subscription Required

Abstract

Citations

17 Claims

Specification

Subscription Required

Solutions

Use Cases

Quick Links