Analyzing lines to detect tables in documents
First Claim
Patent Images
1. A method for recognizing tables in vector graphics based documents comprising:
- receiving a document in an original format, the original format having at least a set of table rendering instructions for at least one table in the document;
parsing the document to determine that the document comprises at least one table and at least one nested table and to identify a set of contents for the table; and
outputting the table and the nested table to an output medium, the output medium presenting the table in a modified format, where determining that the document comprises at least one nested table comprises;
analyzing a first cell of the table to determine whether the first cell comprises lines therein that do not intersect borders of the first cell, andupon analyzing the first cell and finding a first nested cell, utilizing the first nested cell as a reference and determining if a second nested cell occurs adjacent to or below the first nested cell.
2 Assignments
0 Petitions
Accused Products
Abstract
Various technologies and techniques detect tables in vector graphics based documents and use them in meaningful ways. The system detects at least one table in a vector graphics based document using a set of rules. The rules include analyzing a set of content representing horizontal and vertical lines to find intersections and identifying table cells based on the intersections. Once identified, the table content is translated into a modified format. The content can be output to a destination application in the modified format that is more suitable for output or use by the destination application.
21 Citations
17 Claims
-
1. A method for recognizing tables in vector graphics based documents comprising:
-
receiving a document in an original format, the original format having at least a set of table rendering instructions for at least one table in the document; parsing the document to determine that the document comprises at least one table and at least one nested table and to identify a set of contents for the table; and outputting the table and the nested table to an output medium, the output medium presenting the table in a modified format, where determining that the document comprises at least one nested table comprises; analyzing a first cell of the table to determine whether the first cell comprises lines therein that do not intersect borders of the first cell, and upon analyzing the first cell and finding a first nested cell, utilizing the first nested cell as a reference and determining if a second nested cell occurs adjacent to or below the first nested cell. - View Dependent Claims (3, 4, 5, 6, 8, 9)
-
-
2. A method for recognizing tables in vector graphics based documents comprising:
-
receiving a document in an original format, the original format having at least a set of table rendering instructions for at least one table in the document; parsing the document to determine that the document comprises at least one table and at least one nested table and to identify a set of contents for the table; and upon determining that the document comprises at least one table, determining that the table does not comprise a false positive, comprising;
verifying that the table comprises a first cell and a second cell;
confirming that the table comprises text;checking that column edges of the table align; and
checking that rows of the table share top and bottom edges.
-
-
7. A method for recognizing tables in vector graphics based documents comprising:
-
receiving a document in an original format, the original format having at least a set of table rendering instructions for at least one table in the document; and parsing the document to determine that the document comprises at least one table and at least one nested table and to identify a set of contents for the table, where parsing the document comprises; grouping a plurality of horizontal lines together, grouping a plurality of vertical lines together, looking for intersections that are present between the horizontal lines and the vertical lines, and identifying a plurality of table cells based on the intersections and where identifying a plurality of table cells comprises; determining whether intersecting horizontal and vertical lines form a first table cell; and using the first table cell as a reference cell to determine if a same pattern occurs adjacent to the first table cell or below the first table cell.
-
-
10. A computer-readable storage medium having computer-executable instructions for causing a computer to perform a method comprising:
-
detecting at least one table within a vector graphics based document using a set of rules, the rules comprising; analyzing a set of content representing at least one horizontal and vertical line to find intersections; and identifying a plurality of table cells based on the intersections; determining that the document comprises at least one nested table comprising; analyzing a first cell of the table to determine whether the first cell comprises lines therein that do not intersect borders of the first cell, and upon analyzing the first cell and finding a first nested cell, utilizing the first nested cell as a reference and determining if a second nested cell occurs adjacent to or below the first nested cell; translating the at least one table to a modified format; and outputting the modified format to an output medium. - View Dependent Claims (11, 12, 13, 14)
-
-
15. A method for interpreting vector graphics based documents comprising the steps of:
-
receiving an input from a user to copy a section of content, the content having an original format that includes a set of table rendering instructions; interpreting the table rendering instructions to detect that the document comprises at least one table; upon determining that the document comprises at least one table, determining that the table does not comprise a false positive, comprising;
verifying that the table comprises a first cell and a second cell;
confirming that the table comprises text;checking that column edges of the table align and checking that rows of the table share top and bottom edges; translating the content including the at least one table into a modified format; and providing the content in the modified format to the destination application for output. - View Dependent Claims (16, 17)
-
Specification