System for analyzing table images
First Claim
1. A method for performing block selection processing on an image of a table, the table comprised of rows and columns defined by visible and non-visible vertical and horizontal grid lines and containing super-cells, the super-cells including one or more table cells, the method comprising:
- identifying super-cells that include one or more table cells, wherein super-cells are identified according to traced white areas surrounding table cells and bounded by visible grid lines;
determining whether the vertical and horizontal grid lines bounding each table cell are visible or non-visible; and
determining whether the vertical and horizontal grid lines bounding each super-cell are visible or non-visible.
1 Assignment
0 Petitions
Accused Products
Abstract
A method for block selection on a image of a table, the table including rows and columns defined by visible and non-visible grid lines and containing table cells, includes identifying super-cells that include one or more table cells, wherein super-cells are identified according to traced white areas surrounding table cells and bounded by visible grid lines, determining whether vertical and horizontal grid lines bounding each table cell are visible or non-visible, and determining whether vertical and horizontal grid lines bounding each super-cell are visible or non-visible.
-
Citations
33 Claims
-
1. A method for performing block selection processing on an image of a table, the table comprised of rows and columns defined by visible and non-visible vertical and horizontal grid lines and containing super-cells, the super-cells including one or more table cells, the method comprising:
-
identifying super-cells that include one or more table cells, wherein super-cells are identified according to traced white areas surrounding table cells and bounded by visible grid lines;
determining whether the vertical and horizontal grid lines bounding each table cell are visible or non-visible; and
determining whether the vertical and horizontal grid lines bounding each super-cell are visible or non-visible. - View Dependent Claims (2, 3, 4, 5, 6, 7, 8, 9, 10)
detecting areas of reversed text within the image of the table;
calculating a vertical histogram reflecting connected components within the image of the table, the histogram not reflecting connected components within-the detected areas;
defining columns within the image of the table according to the vertical histogram; and
re-defining the columns based on locations of traced white areas and partial grid lines with respect to the defined columns.
-
-
3. A method according to claim 2, wherein the step of detecting areas of reversed text comprises:
-
calculating a horizontal histogram of the image of the table reflecting white pixels within the image;
identifying consecutive rows within the image in which a total distance between boundary pixels in a row is less than one half of a total length of the row;
detecting a traced white area within the consecutive rows;
determining whether the detected traced white area corresponds to reversed text according to a size of the traced white area.
-
-
4. A method according to claim 1, further comprising:
-
detecting areas of reversed text within the image of the table;
calculating a horizontal histogram reflecting connected components within the image of the table, the histogram not reflecting connected components within the detected areas;
defining rows within the image of the table according to the horizontal histogram; and
re-defining the rows based on locations of traced white areas and partial grid lines with respect to the defined rows.
-
-
5. A method according to claim 4, wherein the step of detecting areas of reversed text comprises:
-
calculating a horizontal histogram of the image of the table reflecting white pixels within the image;
identifying consecutive rows within the image in which a total distance between boundary pixels-in a row is less than one half of a total length of the row;
detecting a traced white area within the consecutive rows;
determining whether the detected traced white area corresponds to reversed text according to a size of the traced white area.
-
-
6. A method according to claim 1, wherein, in said identifying step, a first super-cell is identified according to a location and a dimension of a first table cell in a case that the first table cell is not surrounded by a white area.
-
7. A method according to claim 6, further comprising identifying a dummy table cell at a row and column address within a second super-cell in a case that the second super-cell does not include a table cell at the row and column address.
-
8. A method according to claim 7, further comprising storing data regarding the identified super-cells in a hierarchical tree structure which reflects a physical layout of the image of the table.
-
9. A method according to claim 1, further comprising:
-
determining a distance between traced white areas corresponding to adjacent columns;
calculating a location of a vertical grid line corresponding to the adjacent columns based on the determined distance.
-
-
10. A method according to claim 9, wherein an uppermost text line in the table image is ignored in a case that the table image includes more than four rows of text lines.
-
11. A computer-readable medium storing computer-executable process steps to perform block selection processing on an image of a table, the table comprised of rows and columns defined by visible and non-visible vertical and horizontal grid lines and containing super-cells, the super-cells including one or more table cells, the steps comprising:
-
an identifying step to identify super-cells that include one or more table cells, wherein super-cells are identified according to traced white areas surrounding table cells and bounded by visible grid lines;
a determining step to determine whether the vertical and horizontal grid lines bounding each table cell are visible or non-visible; and
a determining step to determine whether the vertical and horizontal grid lines bounding each super-cell are visible or non-visible. - View Dependent Claims (12, 13, 14, 15, 16, 17, 18, 19, 20)
a detecting step to detect areas of reversed text within the image of the table;
a calculating step to calculate a vertical histogram reflecting connected components within the image of the table, the histogram not reflecting connected components within the detected areas.;
a defining step to define columns within the image of the table according to the vertical histogram; and
a re-defining step to re-define the columns based on locations of traced white areas and partial grid lines with respect to the defined columns.
-
-
13. A computer-readable medium storing computer-executable process steps according to claim 12, wherein the step of detecting areas of reversed text comprises:
-
a calculating step to calculate a horizontal histogram of the image of the table reflecting white pixels within the image;
an identifying step to identify consecutive rows within the image in which a total distance between boundary pixels in a row is less than one half of a total length of the row;
a detecting step to detect a traced white area within the consecutive rows;
a determining step to determine whether the detected traced white area corresponds to reversed text according to a size of the traced white area.
-
-
14. A computer-readable medium storing computer-executable process steps according to claim 11, further comprising:
-
a detecting step to detect areas of reversed text within the image of the table;
a calculating step to calculate a horizontal histogram reflecting connected components within the image of the table, the histogram not reflecting connected components within the detected areas;
a defining stop to define rows within the image of the table according to the horizontal histogram; and
a re-defining step to re-define the rows based on locations of traced white areas and partial grid lines with respect to the defined rows.
-
-
15. A computer-readable medium storing computer-executable process steps according to claim 14, wherein the step of detecting areas of reversed text comprises:
-
a calculating step to calculate a horizontal histogram of the image of the table reflecting white pixels within the image;
an identifying step to identify consecutive rows within the image in which a total distance between boundary pixels in a row is less than one half of a total length of the row;
a detecting step to detect a traced white area within the consecutive rows;
a determining step to determine whether the detected traced white area corresponds to reversed text according to a size of the traced white area.
-
-
16. A computer-readable medium storing computer-executable process steps according to claim 11, wherein, in said identifying step, a first super-cell is identified according to a location and a dimension of a first table cell in a case that the first table cell is not surrounded by a white area.
-
17. A computer-readable medium storing computer-executable process steps according to claim 16, further comprising an identifying to identify a dummy table cell at a row and column address within a second super-cell in a case that the second super-cell does not include a table cell at the row and column address.
-
18. A computer-readable medium storing computer-executable process steps according to claim 17, further comprising a storing step to store data regarding the identified super-cells in a hierarchical tree structure which reflects a physical layout of the image of the table.
-
19. A computer-readable medium storing computer-executable process steps according to claim 11, further comprising:
-
a determining step to determine a distance between traced white areas corresponding to adjacent columns;
a calculating step to calculate a location of a vertical grid line corresponding to the adjacent columns based on the determined distance.
-
-
20. A computer-readable medium storing computer-executable process steps according to claim 19, wherein an uppermost text line in the table image is ignored in a case that the table image includes more than four rows of text lines.
-
21. A method for analyzing a document image, comprising:
-
inputting the document image;
detecting connected components within the document image;
identifying a table block based on the detected connected components and on white areas within the table block image;
identifying reversed text areas within the table block;
identifying attached connected components within the table;
forming text blocks and text lines within the table block;
calculating table rows and table columns within the table image;
assigning row and column addresses to table cells within the table block;
calculating locations of vertical and horizontal table grid lines;
defining super-calls that include one or more of the table cells, wherein super-cells are identified according to the white areas;
determining whether vertical and horizontal grid lines bounding each super-cell are visible or non-visible;
splitting table cells having a row or column address range;
re-defining row and column addresses in a case that a slanted visible grid line exists within the table block;
defining a super-cell based on a super-cell hole in the table block;
defining a dummy table cell based on a cell hole within the table block; and
determining whether vertical and horizontal grid lines bounding each table cell are visible or non-visible.
-
-
22. A computer-readable medium storing computer-executable process steps to analyze a document image, the process steps comprising:
-
an inputting step to input the document image;
a detecting step to detect connected components within the document image;
an identifying step to identify a table block based on the detected connected components and on white areas within the table block image;
an identifying step to identify reversed text areas within the table block;
an identifying step to identify attached connected components within the table;
a forming step to form text blocks and text lines within the table block;
a calculating step to calculate table rows and table columns within the table image;
an assigning step to assign row and column addresses to table cells within the table block;
a calculating step to calculate locations of vertical and horizontal table grid lines;
a defining step to define super-cells that include one or more of the table cells, wherein super-cells are identified according to the white areas;
a determining step to determine whether vertical and horizontal grid lines bounding each super-cell are visible or non-visible;
a splitting step to split table cells having a row or column address range;
a re-defining step to re-define row and column addresses in a case that a slanted visible grid line exists within the table block;
a defining step to define a super-cell based on a super-cell hole in the table block;
a defining step to define a dummy table cell based on a cell hole within the table block; and
a determining step to determine whether vertical and horizontal grid lines bounding each table cell are visible or non-visible.
-
-
23. An apparatus for performing block selection processing on an image of a table, the table comprised of rows and columns defined by visible and non-visible vertical and horizontal grid lines and containing super-cells, the super-cells including one or more table cells, comprising:
-
a memory which stores executable process steps; and
a processor which executes process steps in the memory to;
(1) identify super-cells that include one or more table cells, wherein super-cells are identified according to traced white areas surrounding table cells and bounded by visible grid lines, (2) determine whether the vertical and horizontal grid lines bounding each table cell are visible or non-visible, and (3) determine whether the vertical and horizontal grid lines bounding each super-cell are visible or non-visible.- View Dependent Claims (24, 25, 26, 27, 28, 29, 30, 31, 32)
-
-
33. An apparatus for analyzing a document image, comprising:
-
a memory which stores executable process steps; and
a processor which executes process steps in the memory to;
(1) input the document image, (2) detect connected components within the document image, (3) identify a table block based on the detected connected components and on white areas within the table block image, (4) identify reversed text areas within the table block, (5) identify attached connected components within the table, (6) form text blocks and text lines within the table block, (7) calculate table rows and table columns within the table image, (8) assign row and column addresses to table cells within the table block, (9) calculate locations of vertical and horizontal table grid lines, (10) define super-cells that include one or more of the table cells, wherein super-cells are identified according to the white areas, (11) determine whether vertical and horizontal grid lines bounding each super-cell are visible or non-visible, (12) split table cells having a row or column address range;
(13) re-define row and column addresses in a case that a slanted visible grid line exists within the table block, (14) define a super-cell based on a super-cell hole in the table block, (15) define a dummy table cell based on a cell hole within the table block, and (16) determine whether vertical and horizontal grid lines bounding each table cell are visible or non-visible.
-
Specification