Identification of compound graphic elements in an unstructured document
First Claim
1. A non-transitory machine readable medium storing a program which when executed by at least one processing unit analyzes a document, the program comprising sets of instructions for:
- receiving a document that comprises a plurality of primitive graphic elements defined separately within the document, the document having a drawing order that indicates the order in which the primitive graphic elements are drawn when the document is displayed;
calculating a first value for a first primitive graphic element and a second primitive graphic element that is subsequent to the first primitive graphic element in the drawing order by using bounds of the first and second primitive graphic elements;
based on a comparison of the first value to other values calculated for additional primitive graphic elements that are subsequent in the drawing order, defining a cluster comprising at least the first and second primitive graphic elements; and
when the bounds of the first and second primitive graphic elements at least partially overlap, defining a single structural graphic element within the document from the first and second primitive graphic elements.
0 Assignments
0 Petitions
Accused Products
Abstract
Some embodiments provide a method of analyzing an unstructured document. The method receiving the unstructured document that includes a number of primitive graphic elements, each of which is defined as a single object in the unstructured document. The unstructured document has a drawing order that indicates the order in which the primitive graphic elements are drawn when the unstructured document is displayed. The method identifies positional relationships between successive primitive graphic elements in the drawing order. Based on the positional relationships, the method defines a single structural graphic element from several of the primitive graphic elements.
231 Citations
22 Claims
-
1. A non-transitory machine readable medium storing a program which when executed by at least one processing unit analyzes a document, the program comprising sets of instructions for:
-
receiving a document that comprises a plurality of primitive graphic elements defined separately within the document, the document having a drawing order that indicates the order in which the primitive graphic elements are drawn when the document is displayed; calculating a first value for a first primitive graphic element and a second primitive graphic element that is subsequent to the first primitive graphic element in the drawing order by using bounds of the first and second primitive graphic elements; based on a comparison of the first value to other values calculated for additional primitive graphic elements that are subsequent in the drawing order, defining a cluster comprising at least the first and second primitive graphic elements; and when the bounds of the first and second primitive graphic elements at least partially overlap, defining a single structural graphic element within the document from the first and second primitive graphic elements. - View Dependent Claims (2, 3, 4, 5, 6, 7)
-
-
8. A method for analyzing a document, the method comprising:
receiving a document that comprises a plurality of primitive graphic elements defined separately within the document, the document having a drawing order that indicates the order in which the primitive graphic elements are drawn when the document is displayed; calculating a first value for a first primitive graphic element and a second primitive graphic element that is subsequent to the first primitive graphic element in the drawing order by using bounds of the first and second graphic elements; based on a comparison of the first value to other values calculated for additional primitive graphic elements that are subsequent in the drawing order, defining a cluster comprising at least the first and second primitive graphic elements; and when the bounds of the first and second primitive graphic elements at least partially overlap, defining a single structural graphic element within the document from the first and second primitive graphic elements. - View Dependent Claims (9, 10, 11, 12)
-
13. A non-transitory machine readable medium storing a program which when executed by at least one processing unit analyzes a document, the program comprising sets of instructions for:
-
receiving a document that comprises a plurality of primitive graphic elements defined separately within the document; based on values calculated for pairs of primitive graphic elements, defining a set of successive primitive graphic elements; when bounds of primitive graphic elements in the set of successive primitive graphic elements at least partially overlap each other, identifying overlapping primitive graphic elements as subsets of primitive graphic elements within the set of successive primitive elements; calculating, for each of the subsets that have one or more primitive graphic elements, a total spread using bounds of the one or more primitive graphic elements and dimensions of a page containing the primitive graphic elements; and for each of the subsets that have one or more primitive graphic elements and that have a total spread less than a predetermined value, defining a single structural graphic element within the document, the single structural graphic element comprising the primitive graphic elements in the subset. - View Dependent Claims (14, 15, 16, 17)
-
-
18. An apparatus comprising:
-
a set of processing units; and a machine readable medium storing a program which when executed by at least one processing unit analyzes a document, the program comprising sets of instructions for; receiving a document that comprises a plurality of primitive graphic elements defined separately within the document; based on values calculated for pairs of primitive graphic elements, defining a set of successive primitive graphic elements; when bounds of primitive graphic elements in the set of successive primitive graphic elements at least partially overlap each other, identifying overlapping primitive graphic elements as subsets of primitive graphic elements within the set of successive primitive elements; calculating, for each of the subsets that have one or more primitive graphic elements, a total spread using bounds of the one or more primitive graphic elements and dimensions of a page containing the primitive graphic elements; and for each of the subsets that have one or more primitive graphic elements and that have a total spread less than a predetermined value, defining a single structural graphic element within the document, the single structural graphic element comprising the primitive graphic elements in the subset. - View Dependent Claims (19, 20, 21, 22)
-
Specification