Identification of compound graphic elements in an unstructured document
First Claim
1. A computer readable medium storing a computer program which when executed by at least one processor analyzes a document, the computer program comprising sets of instructions for:
- receiving a document that comprises a plurality of primitive graphic elements, each primitive graphic element defined as a single object in the document, the document having a drawing order that indicates the order in which the primitive graphic elements are drawn when the document is displayed;
calculating, for a first primitive graphic element and a second primitive graphic element that is subsequent to the first in the drawing order, a size of a single element that comprises the first and second primitive graphic elements; and
based on the size of the single element compared to a plurality of different size calculations for additional primitive graphic elements that are subsequent in the drawing order, defining a single structural graphic element within the document from the first and second primitive graphic elements.
1 Assignment
0 Petitions
Accused Products
Abstract
Some embodiments provide a method of analyzing an unstructured document. The method receiving the unstructured document that includes a number of primitive graphic elements, each of which is defined as a single object in the unstructured document. The unstructured document has a drawing order that indicates the order in which the primitive graphic elements are drawn when the unstructured document is displayed. The method identifies positional relationships between successive primitive graphic elements in the drawing order. Based on the positional relationships, the method defines a single structural graphic element from several of the primitive graphic elements.
76 Citations
39 Claims
-
1. A computer readable medium storing a computer program which when executed by at least one processor analyzes a document, the computer program comprising sets of instructions for:
-
receiving a document that comprises a plurality of primitive graphic elements, each primitive graphic element defined as a single object in the document, the document having a drawing order that indicates the order in which the primitive graphic elements are drawn when the document is displayed; calculating, for a first primitive graphic element and a second primitive graphic element that is subsequent to the first in the drawing order, a size of a single element that comprises the first and second primitive graphic elements; and based on the size of the single element compared to a plurality of different size calculations for additional primitive graphic elements that are subsequent in the drawing order, defining a single structural graphic element within the document from the first and second primitive graphic elements. - View Dependent Claims (2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 28)
-
-
19. A method for analyzing a document and generating structural elements that define structure in the document based on the analysis, the method comprising:
-
calculating relative positional values for successive primitive graphic elements in a drawing order of a document that comprises a plurality of primitive graphic elements, each primitive graphic element defined as a single object in the document, the drawing order indicating the order in which the primitive graphic elements are drawn when the document is displayed; identifying groups of the calculated relative positional values by ordering the relative positional values and calculating differences between the ordered values; and based on the identified groups of relative positional values, defining clusters of primitive graphic elements to associate as structural graphic elements, wherein the relative positional values for pairs of elements within the same cluster of primitive graphic elements are relatively smaller than the relative positional values between pairs of primitive graphic elements in different clusters. - View Dependent Claims (20, 21, 39)
-
-
22. A computer readable medium storing a computer program which when executed by at least one processor analyzes a document, the computer program comprising sets of instructions for:
-
receiving the document that comprises a plurality of primitive graphic elements, each primitive graphic element defined as a single object in the document, the document having a drawing order that indicates the order in which the primitive graphic elements are drawn when the unstructured document is displayed; calculating relative positional values for each pair of successive primitive graphic elements in the drawing order, wherein the calculated relative positional values relate to a size of the primitive graphic elements in the pair; based on the calculated relative positional values, defining a cluster of successive primitive graphic elements; identifying a set of sub-clusters of primitive graphic elements in the cluster that satisfy particular constraints; and defining each particular sub-cluster as a single structural graphic element within the document comprising the primitive graphic elements in the particular sub-cluster. - View Dependent Claims (23, 24, 25, 26, 27)
-
-
29. A method for analyzing a document, the method comprising:
-
receiving a document that comprises a plurality of primitive graphic elements, each primitive graphic element defined as a single object in the document, the document having a drawing order that indicates the order in which the primitive graphic elements are drawn when the document is displayed; calculating, for a first primitive graphic element and a second primitive graphic element that is subsequent to the first in the drawing order, a size of a single element that comprises the first and second primitive graphic elements; and based on the size of the single element compared to a plurality of different size calculations for additional primitive graphic elements that are subsequent in the drawing order, defining a single structural graphic element within the document from the first and second primitive graphic elements. - View Dependent Claims (30, 31, 32)
-
-
33. A method for analyzing a document, the method comprising:
-
receiving the document that comprises a plurality of primitive graphic elements, each primitive graphic element defined as a single object in the document, the document having a drawing order that indicates the order in which the primitive graphic elements are drawn when the unstructured document is displayed; calculating relative positional values for each pair of successive primitive graphic elements in the drawing order, wherein the calculated relative positional values relate to a size of the primitive graphic elements in the pair; based on the calculated relative positional values, defining a cluster of successive primitive graphic elements; identifying a set of sub-clusters of primitive graphic elements in the cluster that satisfy particular constraints; and defining each particular sub-cluster as a single structural graphic element within the document comprising the primitive graphic elements in the particular sub-cluster. - View Dependent Claims (34, 35, 36)
-
-
37. A machine readable medium storing a program which when executed by at least one processing unit analyzes a document and generates structural elements that define structure in the document based on the analysis, the program comprising sets of instructions for:
-
calculating relative positional values for successive primitive graphic elements in a drawing order of a document that comprises a plurality of primitive graphic elements, each primitive graphic element defined as a single object in the document, the drawing order indicating the order in which the primitive graphic elements are drawn when the document is displayed; identifying groups of the calculated relative positional values by ordering the relative positional values and calculating differences between the ordered values; and based on the identified groups of relative positional values, defining clusters of primitive graphic elements to associate as structural graphic elements, wherein the relative positional values for pairs of elements within the same cluster of primitive graphic elements are relatively smaller than the relative positional values between pairs of primitive graphic elements in different clusters. - View Dependent Claims (38)
-
Specification