Document image processing method and apparatus
First Claim
1. A method for processing a document image, comprising:
- performing horizontal text line extraction on the document image, to obtain horizontal text lines, the number of rows of the horizontal text lines being represented by Nh;
performing vertical text line extraction on the document image, to obtain vertical text lines, the number of columns of the vertical text lines being represented by Nv;
providing an overlapping matrix represented by MO with Nh rows and Nv columns, a value of an element represented by MO(i, j) of the ith row and the jth column of the overlapping matrix MO indicating an overlapping relation between the ith row of horizontal text lines and the jth column of vertical text lines, where 1≦
i≦
Nh and 1≦
j≦
Nv;
merging the overlapping matrix MO in the vertical direction, so that a value of an element of the overlapping matrix MO indicating an overlapping relation between a column of vertical text lines and each of a plurality of rows of horizontal text lines is set as a same value if the column of vertical text lines overlaps with the plurality of rows of horizontal text lines simultaneously;
merging the overlapping matrix MO in the horizontal direction, so that a value of an element of the overlapping matrix MO indicating an overlapping relation between a row of horizontal text lines and each of a plurality of columns of vertical text lines is set as a same value if the row of horizontal text lines overlaps with the plurality of columns of vertical text lines simultaneously;
determining one or more text overlapping regions in the document image, based on the values of the elements of the merged overlapping matrix MO;
counting the total number of strokes or pixel points in the horizontal text lines and in the vertical text lines, respectively, within one of the one or more text overlapping regions; and
determining an orientation of the one of the one or more text overlapping regions is a horizontal orientation if the total number of strokes or pixel points in the horizontal text lines is larger than that in the vertical text lines, otherwise, determining the orientation of the one of the one or more text overlapping regions is a vertical orientation.
2 Assignments
0 Petitions
Accused Products
Abstract
A method for processing a document image includes: performing horizontal and vertical text line extraction on the document image; providing an overlapping matrix, a value of an element of the overlapping matrix indicating an overlapping relation between horizontal and vertical text lines; merging the overlapping matrix in the vertical and horizontal direction; determining one or more text overlapping regions in the document image, based on the values of the elements of the merged overlapping matrix; counting the total number of strokes or pixel points in the horizontal and vertical text lines, respectively, within one of the one or more text overlapping regions; and determining an orientation of the text overlapping region is horizontal if the total number of strokes or pixel points in the horizontal text lines is larger than that in the vertical text lines, otherwise, determining the orientation is vertical.
-
Citations
8 Claims
-
1. A method for processing a document image, comprising:
-
performing horizontal text line extraction on the document image, to obtain horizontal text lines, the number of rows of the horizontal text lines being represented by Nh; performing vertical text line extraction on the document image, to obtain vertical text lines, the number of columns of the vertical text lines being represented by Nv; providing an overlapping matrix represented by MO with Nh rows and Nv columns, a value of an element represented by MO(i, j) of the ith row and the jth column of the overlapping matrix MO indicating an overlapping relation between the ith row of horizontal text lines and the jth column of vertical text lines, where 1≦
i≦
Nh and 1≦
j≦
Nv;merging the overlapping matrix MO in the vertical direction, so that a value of an element of the overlapping matrix MO indicating an overlapping relation between a column of vertical text lines and each of a plurality of rows of horizontal text lines is set as a same value if the column of vertical text lines overlaps with the plurality of rows of horizontal text lines simultaneously; merging the overlapping matrix MO in the horizontal direction, so that a value of an element of the overlapping matrix MO indicating an overlapping relation between a row of horizontal text lines and each of a plurality of columns of vertical text lines is set as a same value if the row of horizontal text lines overlaps with the plurality of columns of vertical text lines simultaneously; determining one or more text overlapping regions in the document image, based on the values of the elements of the merged overlapping matrix MO; counting the total number of strokes or pixel points in the horizontal text lines and in the vertical text lines, respectively, within one of the one or more text overlapping regions; and determining an orientation of the one of the one or more text overlapping regions is a horizontal orientation if the total number of strokes or pixel points in the horizontal text lines is larger than that in the vertical text lines, otherwise, determining the orientation of the one of the one or more text overlapping regions is a vertical orientation. - View Dependent Claims (2, 3, 4)
-
-
5. An apparatus for processing a document image, comprising:
-
a horizontal text line extraction unit adapted to perform horizontal text line extraction on the document image, to obtain horizontal text lines, the number of rows of the horizontal text lines being represented by Nh; a vertical text line extraction unit adapted to perform vertical text line extraction on the document image, to obtain vertical text lines, the number of columns of the vertical text lines being represented by Nv; an overlapping matrix providing unit adapted to provide an overlapping matrix represented by MO with Nh rows and Nv columns, a value of an element represented by MO(i, j) of the ith row and the jth column of the overlapping matrix MO indicating an overlapping relation between the ith row of horizontal text lines and the jth column of vertical text lines, where 1≦
i≦
Nh and 1≦
j≦
Nv;a vertical merging unit adapted to merge the overlapping matrix MO in the vertical direction, so that a value of an element of the overlapping matrix MO indicating an overlapping relation between a column of vertical text lines and each of a plurality of rows of horizontal text lines is set as a same value if the column of vertical text lines overlaps with the plurality of rows of horizontal text lines simultaneously; a horizontal merging unit adapted to merge the overlapping matrix MO in the horizontal direction, so that a value of an element of the overlapping matrix MO indicating an overlapping relation between a row of horizontal text lines and each of a plurality of columns of vertical text lines is set as a same value if the row of horizontal text lines overlaps with the plurality of columns of vertical text lines simultaneously; a text overlapping region determining unit adapted to determine one or more text overlapping regions in the document image, based on the values of the elements of the overlapping matrix MO merged by the vertical merging unit and the horizontal merging unit; a counting unit adapted to count the total number of strokes or pixel points in the horizontal text lines and in the vertical text lines, respectively, within one of the one or more text overlapping regions determined by the text overlapping region determining unit; and a text orientation determining unit adapted to determine an orientation of the one of the one or more text overlapping regions is a horizontal orientation if the total number of strokes or pixel points in the horizontal text lines counted by the counting unit is larger than that in the vertical text lines, otherwise, to determine the orientation of the one of the one or more text overlapping regions is a vertical orientation. - View Dependent Claims (6, 7, 8)
-
Specification