×

System and method for automatic document segmentation

  • US 5,073,953 A
  • Filed: 09/01/1989
  • Issued: 12/17/1991
  • Est. Priority Date: 09/12/1988
  • Status: Expired due to Term
First Claim
Patent Images

1. A method for automatically segmenting a scanned document in an electronic document processing device to separate document areas containing different types of image information comprising the steps of:

  • (a) scanning the document to generate a scanned image thereof;

    (b) subdividing the scanned image into a matrix of subimages;

    (c) analyzing the information contained in each subimage and assigning to each subimage an initial label selected from a first set of labels to obtain an initial label matrix; and

    (d) relating the initial label matrix by changing the labels of individual matrix subimages into relaxed labels which are selected from a second set of labels, pursuant to a plurality of context rules, which is smaller in number than said first set to obtain a pattern of uniformly labeled segments representing those document areas containing different types of information, said context rules comprising the following;

    (i) if in a predetermined array of subimages in the initial label matrix at least n subimages have the label BW and this array does not contain labels from a group G, then change all labels in this array to BW", wherein n is a predetermined number, BW is a predetermined initial label and G is a predetermined subset of the first set of labels;

    (ii) "if in a predetermined array of subimages in the initial label matrix at least m subimages have the label U, then change all labels in this array to U, " wherein m is a predetermined number and U is a predetermined initial label; and

    (iii) (1) "where the label U forms intersecting vertical and horizontal runs, fill the whole rectangle spanned by these runs with the label U,(2) check all combination of horizontal and vertical runs to maximize the area to be filled with the label U, and(3) if the height of the maximized area is smaller than hmin elements or the width is smaller than wmin elements, then change all labels within this areas to BW, " wherein U and BW are predetermined initial labels and hmin and wmin are predetermined numbers.

View all claims
  • 1 Assignment
Timeline View
Assignment View
    ×
    ×