Identification of regions of a document
First Claim
Patent Images
1. A non-transitory machine readable medium storing a program which when executed by at least one processing unit analyzes a document that comprises a plurality of primitive elements, the primitive elements are present in the document prior to analysis by the program, the program comprising sets of instructions for:
- identifying a set of potential boundaries as rectilinearly aligned graphical primitive elements that satisfy a set of size constraints;
identifying a subset of the potential boundaries as actual boundaries by eliminating from the set (i) potential boundaries identified from the graphical primitive elements that intersect with primitive elements that are not in the set of potential boundaries and (ii) potential boundaries identified from the graphical primitive elements that do not intersect at least two additional potential boundaries;
identifying regions bounded by the actual boundaries; and
defining a structured document based on the regions and the primitive elements.
1 Assignment
0 Petitions
Accused Products
Abstract
Some embodiments provide a for analyzing a document that includes a number of primitive elements. The method identifies boundaries between sets of primitive elements and identifies regions bounded by the boundaries. The method uses the identified regions to define structural elements for the document. The method defines a structured document based on the primitive elements and the structural elements.
138 Citations
37 Claims
-
1. A non-transitory machine readable medium storing a program which when executed by at least one processing unit analyzes a document that comprises a plurality of primitive elements, the primitive elements are present in the document prior to analysis by the program, the program comprising sets of instructions for:
-
identifying a set of potential boundaries as rectilinearly aligned graphical primitive elements that satisfy a set of size constraints; identifying a subset of the potential boundaries as actual boundaries by eliminating from the set (i) potential boundaries identified from the graphical primitive elements that intersect with primitive elements that are not in the set of potential boundaries and (ii) potential boundaries identified from the graphical primitive elements that do not intersect at least two additional potential boundaries; identifying regions bounded by the actual boundaries; and defining a structured document based on the regions and the primitive elements. - View Dependent Claims (2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17)
-
-
18. A non-transitory machine readable medium storing a program which when executed by at least one processing unit analyzes a document that comprises a plurality of primitive elements, the program comprising sets of instructions for:
-
identifying a set of potential boundaries as (i) rectilinearly aligned graphical elements that have less than a threshold thickness in one rectilinear direction and (ii) edges of rectilinearly aligned graphical elements that satisfy size constraints, wherein graphical elements that satisfy the size constraints have at least a threshold height and width; identifying as actual boundaries a portion of the potential boundaries identified from the rectilinearly aligned graphical elements, by eliminating from the set (i) potential boundaries that intersect with primitive elements that are not in the set of potential boundaries and (ii) the potential boundaries that do not intersect with at least two additional potential boundaries; identifying regions bounded by the actual boundaries; and defining a structured document based on the regions and the primitive elements. - View Dependent Claims (19, 20, 21)
-
-
22. A method for analyzing a document that comprises a plurality of primitive elements, the method comprising:
-
identifying a set of potential boundaries as (i) rectilinearly aligned graphical elements that have less than a threshold thickness in one rectilinear direction and (ii) edges of rectilinearly aligned graphical elements that satisfy size constraints, wherein graphical elements that satisfy the size constraints have at least a threshold height and width; identifying as actual boundaries a portion of the potential boundaries identified from the rectilinearly aligned graphical elements, by eliminating from the set (i) potential boundaries that intersect with primitive elements that are not in the set of potential boundaries and (ii) the potential boundaries that do not intersect with at least two additional potential boundaries; identifying regions bounded by the actual boundaries; and defining a structured document based on the regions and the primitive elements. - View Dependent Claims (23, 24, 25)
-
-
26. A method for analyzing a document that comprises a plurality of primitive elements, the primitive elements are present in the document prior to analysis, the method comprising:
-
identifying a set of potential boundaries as rectilinearly aligned graphical primitive elements that satisfy a set of size constraints; identifying a subset of the potential boundaries as actual boundaries by eliminating from the set (i) potential boundaries identified from the graphical primitive elements that intersect with primitive elements that are not in the set of potential boundaries and (ii) potential boundaries identified from the graphical primitive elements that do not intersect at least two additional potential boundaries; identifying regions bounded by the actual boundaries; and defining a structured document based on the regions and the primitive elements. - View Dependent Claims (27, 28, 29, 30, 31, 32, 33, 34, 35, 36, 37)
-
Specification