System and method for automatic document segmentation
First Claim
1. A method for automatically segmenting a scanned document in an electronic document processing device to separate document areas containing different types of image information comprising the steps of:
- (a) scanning the document to generate a scanned image thereof;
(b) subdividing the scanned image into a matrix of subimages;
(c) analyzing the information contained in each subimage and assigning to each subimage an initial label selected from a first set of labels to obtain an initial label matrix; and
(d) relating the initial label matrix by changing the labels of individual matrix subimages into relaxed labels which are selected from a second set of labels, pursuant to a plurality of context rules, which is smaller in number than said first set to obtain a pattern of uniformly labeled segments representing those document areas containing different types of information, said context rules comprising the following;
(i) if in a predetermined array of subimages in the initial label matrix at least n subimages have the label BW and this array does not contain labels from a group G, then change all labels in this array to BW", wherein n is a predetermined number, BW is a predetermined initial label and G is a predetermined subset of the first set of labels;
(ii) "if in a predetermined array of subimages in the initial label matrix at least m subimages have the label U, then change all labels in this array to U, " wherein m is a predetermined number and U is a predetermined initial label; and
(iii) (1) "where the label U forms intersecting vertical and horizontal runs, fill the whole rectangle spanned by these runs with the label U,(2) check all combination of horizontal and vertical runs to maximize the area to be filled with the label U, and(3) if the height of the maximized area is smaller than hmin elements or the width is smaller than wmin elements, then change all labels within this areas to BW, " wherein U and BW are predetermined initial labels and hmin and wmin are predetermined numbers.
1 Assignment
0 Petitions
Accused Products
Abstract
In a system and method for automatically segmenting a scanned document to separate areas containing different types of information such as black/white text or graphics, continuous tone pictures, and half-tone pictures, the document is divided into a number of subimages and the individual subimages are classified in an initial labeling phase. The initial label matrix thus obtained is relaxed in a subsequent step so that a pattern of uniformly labeled segments corresponding to the areas in the document containing different types of information is the result. In order to speed-up the system and/or improve the robustness thereof, the number of initial labels used in the initial labeling step is selected larger than the number of types of information to be distinguished. The number of initial labels is then reduced in the relaxation step on the basis of a series of context rules.
-
Citations
6 Claims
-
1. A method for automatically segmenting a scanned document in an electronic document processing device to separate document areas containing different types of image information comprising the steps of:
-
(a) scanning the document to generate a scanned image thereof; (b) subdividing the scanned image into a matrix of subimages; (c) analyzing the information contained in each subimage and assigning to each subimage an initial label selected from a first set of labels to obtain an initial label matrix; and (d) relating the initial label matrix by changing the labels of individual matrix subimages into relaxed labels which are selected from a second set of labels, pursuant to a plurality of context rules, which is smaller in number than said first set to obtain a pattern of uniformly labeled segments representing those document areas containing different types of information, said context rules comprising the following; (i) if in a predetermined array of subimages in the initial label matrix at least n subimages have the label BW and this array does not contain labels from a group G, then change all labels in this array to BW", wherein n is a predetermined number, BW is a predetermined initial label and G is a predetermined subset of the first set of labels; (ii) "if in a predetermined array of subimages in the initial label matrix at least m subimages have the label U, then change all labels in this array to U, " wherein m is a predetermined number and U is a predetermined initial label; and (iii) (1) "where the label U forms intersecting vertical and horizontal runs, fill the whole rectangle spanned by these runs with the label U, (2) check all combination of horizontal and vertical runs to maximize the area to be filled with the label U, and (3) if the height of the maximized area is smaller than hmin elements or the width is smaller than wmin elements, then change all labels within this areas to BW, " wherein U and BW are predetermined initial labels and hmin and wmin are predetermined numbers. - View Dependent Claims (2, 3, 4, 5, 6)
-
Specification