System for segmenting line drawings from text within a binary digital image
First Claim
Patent Images
1. A method of analyzing data forming a two-dimensional image, comprising the step of:
- identifying a subset of black pixels in the data as likely to belong to a line drawing region if there is not a predetermined arrangement of horizontal runs of white pixels above and below the subset of black pixels, the identifying step including the step of for a subset of the data characterized by a first run of white pixels overlapping a second run of white pixels by a minimum length W, the first run of white pixels being spaced from the second run of white pixels by a vertical separation H, identifying the subset of data as likely to belong to a line drawing region if W/H is not within a predetermined range.
4 Assignments
0 Petitions
Accused Products
Abstract
A system that identifies and discriminates between image regions that consist of text lines of alphanumeric characters and image regions that largely consist of non-alphanumeric line-drawing components. Only image components which are determined to be alphanumeric characters are submitted to an OCR program, thus saving processing time and avoiding errors. The system mainly exploits the principle that text blocks in an image are characterized by regularly spaced horizontal runs of white consistent with inter-line spaces.
-
Citations
18 Claims
-
1. A method of analyzing data forming a two-dimensional image, comprising the step of:
-
identifying a subset of black pixels in the data as likely to belong to a line drawing region if there is not a predetermined arrangement of horizontal runs of white pixels above and below the subset of black pixels, the identifying step including the step of for a subset of the data characterized by a first run of white pixels overlapping a second run of white pixels by a minimum length W, the first run of white pixels being spaced from the second run of white pixels by a vertical separation H, identifying the subset of data as likely to belong to a line drawing region if W/H is not within a predetermined range.
-
-
2. A method of analyzing data forming a two-dimensional image comprising the steps of:
-
identifying a subset of black pixels in the data as likely to belong to a line drawing region if there is not a predetermined arrangement of horizontal runs of white pixels above and below the subset of black pixels;
the identifying step including the step of, for a subset of the data characterized by a first run of white pixels overlapping a second run of white pixels by a minimum length W, the first run of white pixels being spaced from the second run of white pixels by a vertical separation H, identifying the subset of data as likely to belong to a line drawing region if W/H is not within a predetermined range; and
identifying the subset of data as likely to be a line drawing region if, for at least a pair of runs of white pixels overlapping by a length W1 or W2, separated respectively by vertical separation H1 or H2, neither W1/H1 or W2/H2 is within a predetermined range. - View Dependent Claims (3, 7, 8, 9, 10)
deriving low-resolution data from the data, the low-resolution data being consistent with a low-resolution version of the two-dimensional image; - and
wherein said identifying step is performed on the low-resolution data.
-
-
7. The method of claim 2, further comprising the step of
identifying as likely to belong to line drawing components, a set of black pixels which are connected to data in the image which is identified as likely to belong to a line drawing region. -
8. The method of claim 7, further comprising the steps of
subtracting the set of data likely to belong to line drawing components from the data; - and
identifying a remainder of data, following said subtracting step, as likely to belong to text regions.
- and
-
9. The method of claim 7, comprising the step of
deriving low-resolution data from the data, the low-resolution data being consistent with a low-resolution version of the two-dimensional image, and performing said step of subtracting the set of data likely to belong to line drawing components on the low-resolution data. -
10. The method of claim 7, further comprising the step of
identifying as likely to be body text, from the data not identified as likely to belong to a line drawing region, clusters of a predetermined size.
-
4. A method of analyzing data forming a two-dimensional image, comprising the steps of:
-
identifying a subset of black pixels in the data as likely to belong to a line drawing region if there is not a predetermined arrangement of horizontal runs of white pixels above and below the subset of black pixels, the identifying step including the step of deriving low-resolution data from the data, the low-resolution data being consistent with a low-resolution version of the two-dimensional image, wherein said identifying step is performed on the low-resolution data;
for a subset of the data characterized by a first run of white pixels overlapping a second run of white pixels by a minimum length W, the first run of white pixels being spaced from the second run of white pixels by a vertical separation H, identifying the subset of data as likely to be a line drawing region if W/H is not within a predetermined range; and
the step of deriving low-resolution data including the steps of rendering the data a low resolution map;
performing on the low resolution map a closing with a vertical structuring element of height H; and
performing on the resulting low resolution map a dilation with a horizontal structuring element of a width of not more than W. - View Dependent Claims (5, 6)
eroding a horizontal run of white pixels from the data; - and
rendering the eroded horizontal run of white pixels on a low resolution map.
-
-
6. The method of claim 4, the step of using a horizontal structuring element having a width of one pixel less than W.
-
11. A method of analyzing data forming a two-dimensional image, comprising the steps of:
-
identifying a subset of black pixels in the data as likely to belong to a line drawing region if there is not a predetermined arrangement of horizontal runs of white pixels above and below the subset of black pixels;
identifying as likely to be body text, from the data not identified as likely to belong to a line drawing region, clusters of a predetermined size;
identifying as likely to belong to line drawing components, a set of black pixels which are connected to data in the image which is identified as likely to belong to a line drawing region; and
performing a closing on the clusters. - View Dependent Claims (12, 13)
subtracting data identified as likely to be body text from any data in the image not previously identified as a text region, thereby yielding as a remainder a refined set of line art components. -
13. The method of claim 11, further comprising the step of
identifying as likely to be stray text, from the data identified as likely to be text components, data which is not included in clusters of data of a predetermined size.
-
-
14. A method of analyzing full-resolution data forming a two-dimensional image, comprising the steps of:
-
deriving low-resolution data from the full-resolution data, the low-resolution data being consistent with a low-resolution version of the two-dimensional image;
for a subset of the low-resolution data characterized by a first run of white pixels overlapping a second run of white pixels a minimum length W, the first run of white pixels being spaced from the second run of white pixels by a vertical spacing H, identifying the subset of data as likely to belong to a line drawing region if W/H is not within a predetermined range;
identifying as likely to belong to line art components within the low-resolution data, a set of black pixels which are connected to data in the image which is identified as likely to belong to a line drawing region. - View Dependent Claims (15, 16, 17, 18)
identifying as likely to be body text, from the low-resolution data not identified as likely to belong to a line drawing region, clusters of data of a predetermined size. -
16. The method of claim 14, further comprising the step of detecting data which is not included in clusters of data of a predetermined size from the low-resolution data not identified as likely to belong to a line drawing region, and identifying such detected data as likely to be stray text.
-
17. The method of claim 15, further comprising the step of
subtracting data identified as likely to be body text from any data in the image not previously identified as a text region, thereby yielding as a remainder a refined set of line art components. -
18. The method of claim 17, further comprising the steps of
identifying as stray text, from the low-resolution data not identified as likely to belong to a line drawing region, connected components which are not included in clusters of data of a predetermined size; adding a connected component of the stray text to the refined set of line art components if the connected component of the stray text is connected to any of the refined set of line art components.
-
Specification