Cell identification in table analysis
First Claim
1. A method of identifying, during page recomposition, cells in a table scanned by optical character scanning means as part of an optical character recognition process, comprising the steps of:
- segmenting said table into tabular region of rows and columns of individual cells;
merging individual cells which share the same row and column;
determining a plurality of vertical and horizontal rulings;
combining adjacent vertical and horizontal rulings to form vertical and horizontal frames;
merging cells which share the same horizontal and vertical frame expanding cells to fit within the nearest vertical and horizontal frame; and
returning said table to the page recomposition process.
6 Assignments
0 Petitions
Accused Products
Abstract
The present invention handles fully-lined, semi-lined and line-less cell tables by identifying the cells and cell separators during page recomposition processes as part of optical character recognition processes. The invention accomplishes such by iteratively identifying cell separators and cells. The processes accomplishes this by iteratively merging word boxes into cells, iteratively finding separators, and iteratively merging cells bounded by the same separators, and repeating these steps until the correct cell structure is found. With this method, rows are estimated, close words are merged into cells, columns are then estimated, cells within columns are merged, columns re-estimated, cells in the same row and column are merged into bigger cells, and then rows and cells are merged according to the detection of various table styles. This invention handles large complex tables with multiple lines of symbols per cell. This method handles multiple line cells in lined, semi-lined and line-less tables.
-
Citations
9 Claims
-
1. A method of identifying, during page recomposition, cells in a table scanned by optical character scanning means as part of an optical character recognition process, comprising the steps of:
-
segmenting said table into tabular region of rows and columns of individual cells; merging individual cells which share the same row and column; determining a plurality of vertical and horizontal rulings; combining adjacent vertical and horizontal rulings to form vertical and horizontal frames; merging cells which share the same horizontal and vertical frame expanding cells to fit within the nearest vertical and horizontal frame; and returning said table to the page recomposition process. - View Dependent Claims (2, 3, 4, 5)
-
-
6. A method of identifying, during page recomposition, cells in a table scanned by optical character scanning means as part of an optical character recognition process, comprising the steps of:
-
segmenting said table into tabular region of rows and columns of individual cells of words; merging cells of individual words which share the same row and column; determining midpoints of gaps between said columns to form vertical rulings; determining midpoints of gaps between said rows to form horizontal rulings; combining vertical rulings which are within 4.5 mm horizontally and which overlap vertically to form vertical frames; combining horizontal rulings which are within 1.5 mm vertically and which overlap horizontally to form horizontal frames; merging cells which share the same horizontal and vertical frame expanding cells to fit within the nearest vertical and horizontal frame; and returning said table to the page recomposition process. - View Dependent Claims (7, 8, 9)
-
Specification