×

Segmentation of text, picture and lines of a document image

  • US 5,335,290 A
  • Filed: 04/06/1992
  • Issued: 08/02/1994
  • Est. Priority Date: 04/06/1992
  • Status: Expired due to Term
First Claim
Patent Images

1. In a character recognition system, a method for segmenting portions of a medium into text and non-text types, said method comprising the steps of:

  • a) compressing a bit mapped representation of said medium, said compressing said bit mapped representation of said medium includinga.i) providing said bit mapped representation of said medium to a compression means,a.ii) compressing group of N scanlines of said bit mapped representation into a corresponding compressed scanline, anda.iii) constructing a compressed representation of said medium from said compressed scanlines;

    b) providing said compressed representation of said medium to a run length extraction and classification means, said compressed representation comprised of one or more scanlines;

    c) extracting run lengths from each scanline of said compressed representation of said medium;

    d) creating a run length record for each extracted run length, each run length record including a classification of the corresponding run length as short, medium or long based on it'"'"'s length;

    e) constructing rectangles from said run length records, said rectangles representing a portion of said medium;

    f) determining a skew of said rectangles;

    g) correcting for skew of said rectangles;

    h) classifying each of said rectangles as type image, vertical line, horizontal line or unknown; and

    i) merging rectangles of type UNKNOWN into one or more text blocks.

View all claims
  • 2 Assignments
Timeline View
Assignment View
    ×
    ×