Polygon-based technique for the automatic classification of text and graphics components from digitized paper-based forms

US 5,050,222 A
Filed: 05/21/1990
Issued: 09/17/1991
Est. Priority Date: 05/21/1990
Status: Expired due to Term

First Claim

Patent Images

1. A method of classifying components of an image into text or graphics comprising the steps of:

a) digitizing the image to form a bit map representation of the image;

b) extracting a set of contour vectors from the bit map image; and

c) extracting from the set of contour vectors a set of polygon features;

d) employing the set of polygon features to classify a first set of graphics components;

e) separating the image contour vectors into inner and outer contours;

f) sorting all of the inner and outer contours according to horizontal location in their respective group;

g) employing the inner and outer contours and the row segregation to classify a second set of graphic components;

h) employing the polygon features and row segmentation to detect space between polygons in a horizontal projection between two consecutive polygons to identify a group of object strings;

i) extracting from the group of textual strings a third set of graphic components in the form of single like text strings.

View all claims

1 Assignment

Timeline View

Assignment View

0 Petitions

Accused Products

Abstract

A polygon-based graphics/text separation method is comprised of two sequential processes. First a raster to contour vector conversion step is used to convert a digitized bitmap into a collection of simple polygons. Next a component classification process is used to extract six particularly defined features of each of the individual polygon-based components to enable the separation of graphics and text polygons. Graphical polygons are further classified into four subclasses. Textual polygons are grouped into polygon strings (text strings). Each string contains a sequence of segmented character contour polygons which is ready for an optical character recognition algorithm to convert them into computer understandable ASCII characters.

Citations

3 Claims

1. A method of classifying components of an image into text or graphics comprising the steps of:
- a) digitizing the image to form a bit map representation of the image;
  
  b) extracting a set of contour vectors from the bit map image; and
  
  c) extracting from the set of contour vectors a set of polygon features;
  
  d) employing the set of polygon features to classify a first set of graphics components;
  
  e) separating the image contour vectors into inner and outer contours;
  
  f) sorting all of the inner and outer contours according to horizontal location in their respective group;
  
  g) employing the inner and outer contours and the row segregation to classify a second set of graphic components;
  
  h) employing the polygon features and row segmentation to detect space between polygons in a horizontal projection between two consecutive polygons to identify a group of object strings;
  
  i) extracting from the group of textual strings a third set of graphic components in the form of single like text strings.

2. A method for the automatic classification of text and graphics components on a paper document comprising the steps of:
- a) scanning the image to produce a bit map digital image;
  
  b) generating from the bit map image a set of contour vectors;
  
  c) extracting from the set of contour vectors a set of polygon features;
  
  d) employing the set of polygon features to classify a first set of graphics components;
  
  e) separating the image contour vectors into inner and outer contours;
  
  f) sorting all of the inner and outer contours according to horizontal location in their respective groups;
  
  g) employing the inner and outer contours and the row segregation to classify a second set of graphic components;
  
  h) employing the polygon features and row segmentation to detect space between polygons in a horizontal projection between two consecutive polygons to identify a group of object strings;
  
  i) extracting from the group of object strings a third set of graphic components;
  
  j) classifying the remaining objects in the object string as text; and
  
  k) extracting from the group of object strings a third set of graphic components in the form of single line text strings.

3. A method for the automatic classification of text and graphics components on a paper document comprising the steps of:
- a) raster scanning said document;
  
  b) generating as the document is being scanned, a bit map representation of the document such that wherever a transition is detected contour vectorization is used to convert the bit representation into a collection of closed polygons formed by a series of vectors;
  
  c) calculating a plurality of features associated with the geometric parameters of the polygon;
  
  d) establishing a first threshold level to be applied to the collection of polygons such that the calculated value for that polygon is above said first threshold identifying the polygon as a large graphic;
  
  e) sorting the collection of polygons into inner and outer contours;
  
  f) sorting all of the inner and outer contours according to horizontal location in their respective groups;
  
  g) establishing polygon links for all polygons having a geometrical relationship in the horizontal direction;
  
  h) use contour linking to examine the geometrical relationship of the coordinates of both the outer and inner contours of a polygon to determine if they are geometrically overlapping;
  
  i) separate inner contour of polygons into either graphics or text after comparing the features of the polygon to a predetermined threshold; and
  
  j) comparing a subset of features of the remaining polygons which lie on the same horizontal level with predetermined threshold and remove single line text strings.

Specification

Resources

Litigation Campaign Assessment

Current Assignee
Eastman Kodak Company
Original Assignee
Eastman Kodak Company
Inventors
Lee, Yongchun
Primary Examiner(s)
Moore, David K.
Assistant Examiner(s)
COUSO, YON JUNG

Application Number

US07/526,928
Time in Patent Office

484 Days
Field of Search

382/21, 382/22, 382/25, 382/36, 382/38, 358/462, 358/464
US Class Current

382/176
CPC Class Codes

G06V 30/10   Character recognition

G06V 30/1444   Selective acquisition, loca...

G06V 30/182   by coding the contour of th...

Polygon-based technique for the automatic classification of text and graphics components from digitized paper-based forms

First Claim

1 Assignment

0 Petitions

Accused Products

Abstract

Citations

3 Claims

Specification

Solutions

Use Cases

Quick Links

Polygon-based technique for the automatic classification of text and graphics components from digitized paper-based forms

First Claim

1 Assignment

Subscription Required

Subscription Required

0 Petitions

Subscription Required

Accused Products

Subscription Required

Abstract

Citations

3 Claims

Specification

Subscription Required

Solutions

Use Cases

Quick Links