Method for identifying and using table structures
First Claim
1. A method for recognizing the structure of a delineated table region in an electronic document, comprising the steps of:
- a) creating a binary tree using a hierarchical clustering of a plurality of words included in said table region;
b) segregating a plurality of table columns using a breadth-first traversal algorithm;
c) identifying column headers, if any, using a first heuristic algorithm; and
d) identifying row headers, if any, using a second heuristic algorithm; and
e) segregating at least one table row using a row determination algorithm.
7 Assignments
0 Petitions
Accused Products
Abstract
A method for recognizing a table structure from delineated table region in an electronic document using hierarchical clustering of data strings. The cluster groupings are segregated effectively using the distances from a positional vector associated with words and groups of words rather than a minimum number of blank spaces between words. Once a data tree of the hierarchical clusterings is constructed, the tree is scanned downward from the root to find appropriate column boundaries using a columnization algorithm. Then using successive heuristic algorithms, determine column and row headers and row boundaries.
-
Citations
21 Claims
-
1. A method for recognizing the structure of a delineated table region in an electronic document, comprising the steps of:
-
a) creating a binary tree using a hierarchical clustering of a plurality of words included in said table region;
b) segregating a plurality of table columns using a breadth-first traversal algorithm;
c) identifying column headers, if any, using a first heuristic algorithm; and
d) identifying row headers, if any, using a second heuristic algorithm; and
e) segregating at least one table row using a row determination algorithm. - View Dependent Claims (2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19)
-
-
20. A method for querying an electronic table comprising the steps of:
-
a) creating and storing a first list of keywords, said keywords representing the Acells of a table;
b) creating and storing a second list of keywords to be used to determine actions to be taken with said table;
c) parsing said query for at least one action keywords that matches at least one word included in said second list of keywords;
d) parsing said query for at least one keyword that matches at least one word in the first list of keywords. - View Dependent Claims (21)
-
Specification