×

Automatic separation of text from background in scanned images of complex documents

  • US 5,280,367 A
  • Filed: 05/28/1991
  • Issued: 01/18/1994
  • Est. Priority Date: 05/28/1991
  • Status: Expired due to Term
First Claim
Patent Images

1. A computer implemented process for separating text information from background information in a scanned electronic image of a document, said computer implemented process comprising the steps of:

  • (a) electronically scanning the document to convert the document into said scanned electronic image of the document;

    (b) examining said scanned electronic image and dividing said scanned electronic image into a plurality of blocks;

    (c) constructing a histogram of gray scale values of pixels within one of said blocks;

    (d) dividing said histogram into three regions comprising a first region, a middle region and a last region;

    (d) determining a number of peaks of said histogram in each of said three regions;

    (f) if said histogram contains no peak in said middle region, setting a threshold gray scale level between a gray scale level of a peak having a highest gray scale level in said first region and a gray scale level of a peak having a lowest gray scale level in said last region;

    (g) separating said text information from said background information by reexamining said block using said threshold gray scale level set in step (f); and

    (h) repeating steps (c) through (g) for each of said plurality of blocks.

View all claims
  • 2 Assignments
Timeline View
Assignment View
    ×
    ×