Identification of Tables in an Unstructured Document
First Claim
Patent Images
1. A computer readable medium storing a computer program which when executed by at least one processor analyzes a document comprising a plurality of primitive elements, the computer program comprising sets of instructions for:
- identifying boundaries between sets of primitive elements;
identifying that a plurality of the boundaries form a table; and
defining a tabular structural element for the table, the tabular structural element comprising a plurality of cells arranged in a plurality of rows and columns, each cell comprising an associated set of primitive elements.
1 Assignment
0 Petitions
Accused Products
Abstract
Some embodiments provide a method for analyzing an unstructured document that includes a number of glyphs. The method identifies boundaries between sets of glyphs. The method identifies that several of the boundaries form a table. The method defines a tabular structural element based on the table. The tabular structural element includes several cells arranged in a plurality of rows and columns, each of which includes an associated set of glyphs.
108 Citations
25 Claims
-
1. A computer readable medium storing a computer program which when executed by at least one processor analyzes a document comprising a plurality of primitive elements, the computer program comprising sets of instructions for:
-
identifying boundaries between sets of primitive elements; identifying that a plurality of the boundaries form a table; and defining a tabular structural element for the table, the tabular structural element comprising a plurality of cells arranged in a plurality of rows and columns, each cell comprising an associated set of primitive elements. - View Dependent Claims (2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20)
-
-
21. A method for defining a program for (i) analyzing a document comprising a plurality of primitive elements and (ii) generating structural elements that define structure in said document based on said analysis, the method comprising:
-
defining a module for identifying boundaries between sets of primitive elements; defining a module for identifying that a plurality of the boundaries form a table; and defining a module for defining a tabular structural element based on the table, the tabular structural element comprising a plurality of cells arranged in a plurality of rows and columns, each cell comprising an associated set of primitive elements.
-
-
22. A computer readable medium storing a computer program which when executed by at least one processor analyzes a document comprising a plurality of primitive elements, the computer program comprising sets of instructions for:
-
identifying a first set of primitive elements that comprise a table; defining a tabular structural element for the first set of primitive elements; identifying a second set of primitive elements that do not comprise a table; defining a set of non-tabular structural elements for the second set of primitive elements; defining a structured document comprising the tabular structural element and the set of non-tabular structural elements. - View Dependent Claims (23, 24, 25)
-
Specification