Method for image segmentation and classification of image elements for documents processing

US 5,751,850 A
Filed: 10/04/1996
Issued: 05/12/1998
Est. Priority Date: 06/30/1993
Status: Expired due to Term

First Claim

Patent Images

1. Method for removing unwanted information, lines or printed characters from documents prior to character recognition of written information, comprising the steps of:

1) segmentation of an image into image elements;

searching each image element to determine if it comprises more than one image element by scanning a pixel array in a horizontal and a vertical direction, and identifying a common border between two parallel pixel runs, said common border having a length below a threshold value;

cutting a connection between said two parallel runs at said common border to break an image element having said common border into several image elements;

2) extraction of feature information from each image element;

3) classification of each of the image elements;

4) removal of those image elements which are classified as unwanted information, lines and printed characters; and

5) processing remaining image elements for writing recognition.

View all claims

0 Assignments

Timeline View

Assignment View

0 Petitions

Accused Products

Abstract

A method to segment, classify and clean an image is presented. It may be used in applications which have image data as their input that contains different classes of elements. The method will find, separate and classify those elements. Only significant elements must be kept for further processing and thus the amount of processed data may be significantly reduced.

71 Citations

View as Search Results

14 Claims

1. Method for removing unwanted information, lines or printed characters from documents prior to character recognition of written information, comprising the steps of:
- 1) segmentation of an image into image elements;
  
  searching each image element to determine if it comprises more than one image element by scanning a pixel array in a horizontal and a vertical direction, and identifying a common border between two parallel pixel runs, said common border having a length below a threshold value;
  
  cutting a connection between said two parallel runs at said common border to break an image element having said common border into several image elements;
  
  2) extraction of feature information from each image element;
  
  3) classification of each of the image elements;
  
  4) removal of those image elements which are classified as unwanted information, lines and printed characters; and
  
  5) processing remaining image elements for writing recognition.
- View Dependent Claims (2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14)
- - 2. Method as in claim 1, wherein those image elements that are below a required minimum size are discarded, in step 1.
  - 3. Method as in claim 1, wherein said feature extraction from each image element is performed during the segmentation process.
  - 4. Method as in claim 3, wherein neighborhood and local features are calculated, said neighborhood feature values describing the relationship between the single image element and its neighboring image elements, said local feature values describing properties of the image element itself.
  - 5. Method as in claim 4, wherein as a neighborhood feature value the number of neighbored image elements in a specific direction is calculated in combination with counts of only those image elements having nearly the same size properties.
  - 6. Method as in claim 4, wherein as local feature value there is calculated a density feature being the ratio between the number of foreground pixels and the number of background pixels in a rectangular area described by the maximum horizontal and vertical extensions of the image element.
  - 7. Method as in claim 4, wherein each local feature value has a corresponding neighborhood feature value equivalent, said equivalent being calculated as the average of the local feature values from each image element inside a region given by a fixed radius, said calculated feature values being weighted by their specific distances.
  - 8. Method as in claim 1, wherein in said classification step the feature values of each image element are fed into an artificial neural net, weighted internally, and an output is calculated giving a value indicative of the probability of whether the image element for that feature set does belong to a specific class.
  - 9. Method as in claim 1, wherein in said classification step, calculating for each image element using an artificial neural network having multiple outputs, probability values for each image element class presented to said neural network during training of said neural network, and said probability values of the class membership of each image element is stored together with the image element for further processing, whereby recognized and stored classes are document parts.
  - 10. Method as in claim 8, wherein said classification step is repeated until a stable result is achieved.
  - 11. Method as in claim 8, wherein a feedback is incorporated by using a known probability value of a specific class membership for each image element as an additional feature value, by calculating the average value of the probability values for a specific class from each image element inside a region given by a fixed radius, these feature values also feeding into said neural network.
  - 12. Method as in claim 8, wherein classified image elements are grouped together into clusters of corresponding image elements, said grouping being based on information regarding size, position or associated features values.
  - 13. Method as in claim 1, wherein before removing unwanted image elements, those elements are checked for intersections with other image elements not to be removed.
  - 14. Method as in claim 13, wherein a pair of intersecting image elements is replaced by a number of new image elements having no intersection, and the intersecting area itself is made part of one of the pair of original image elements.

Specification

Resources

Litigation Campaign Assessment

Current Assignee
International Business Machines Corporation
Original Assignee
International Business Machines Corporation
Inventors
Rindtorff, Klaus
Primary Examiner(s)
SHALWALA, BIPIN H

Application Number

US08/726,887
Time in Patent Office

585 Days
Field of Search

382/173, 382/178, 382/179, 382/224, 382/225, 382/228, 382/156, 382/159, 382/190-195, 382/180, 382/284, 395/21, 364/274.9
US Class Current

382/178
CPC Class Codes

G06V 30/10   Character recognition

G06V 30/1444   Selective acquisition, loca...

G06V 30/15   Cutting or merging image el...

G06V 30/155   Removing patterns interferi...

G06V 30/40   Document-oriented image-bas...

G06V 40/30   Writer recognition; Reading...

Method for image segmentation and classification of image elements for documents processing

First Claim

0 Assignments

0 Petitions

Accused Products

Abstract

71 Citations

14 Claims

Specification

Solutions

Use Cases

Quick Links

Method for image segmentation and classification of image elements for documents processing

First Claim

0 Assignments

Subscription Required

Subscription Required

0 Petitions

Subscription Required

Accused Products

Subscription Required

Abstract

71 Citations

14 Claims

Specification

Subscription Required

Solutions

Use Cases

Quick Links