Content profiling to dynamically configure content processing
First Claim
Patent Images
1. A method for defining a program for reconstructing a document, the method comprising:
- defining a default set of document reconstruction operations for (i) identifying sets of primitive elements in an unstructured document that comprises a plurality of unassociated primitive elements and (ii) defining associations between the sets of primitive elements as structural elements in order to define a structured document from the unstructured document, wherein the primitive elements comprise at least one of glyphs and vector graphics;
defining a hierarchical set of profiles, each particular profile comprising (i) a set of clauses specifying potential results of previously-performed document reconstruction operations and (ii) instructions for modifying the set of document reconstruction operations to perform when actual results of the previously-performed document reconstruction operations match the set of clauses for the particular profile in order to define a modified set of document reconstruction operations different from the default set of document reconstruction operations, the results of the previously-performed document reconstruction operations comprising the structural elements defined as associations between sets of primitive elements of the document by at least one of the performed document reconstruction operations, wherein instructions from a profile at a lower level in the hierarchical set of profiles override instructions from a profile at a higher level; and
defining a module for matching results of the previously-performed document reconstruction operations for a particular portion of the document to one of a plurality of profiles at a particular level in the hierarchical set of profiles in order to modify the set of document reconstruction operations to perform for the particular portion of the document, wherein the particular level of the hierarchy corresponds to the particular portion of the document.
1 Assignment
0 Petitions
Accused Products
Abstract
Some embodiments provide a method that receives an unstructured document including a number of primitive elements. The method identifies a default set of document reconstruction operations for reconstructing the unstructured document to define a structured document. The method performs at least one of the document reconstruction operations from the default set. Based on results of the performed document reconstruction operations, the method identifies a profile for the unstructured document. The method modifies the set of document reconstruction operations for reconstructing the unstructured document according to the identified profile.
-
Citations
25 Claims
-
1. A method for defining a program for reconstructing a document, the method comprising:
-
defining a default set of document reconstruction operations for (i) identifying sets of primitive elements in an unstructured document that comprises a plurality of unassociated primitive elements and (ii) defining associations between the sets of primitive elements as structural elements in order to define a structured document from the unstructured document, wherein the primitive elements comprise at least one of glyphs and vector graphics; defining a hierarchical set of profiles, each particular profile comprising (i) a set of clauses specifying potential results of previously-performed document reconstruction operations and (ii) instructions for modifying the set of document reconstruction operations to perform when actual results of the previously-performed document reconstruction operations match the set of clauses for the particular profile in order to define a modified set of document reconstruction operations different from the default set of document reconstruction operations, the results of the previously-performed document reconstruction operations comprising the structural elements defined as associations between sets of primitive elements of the document by at least one of the performed document reconstruction operations, wherein instructions from a profile at a lower level in the hierarchical set of profiles override instructions from a profile at a higher level; and defining a module for matching results of the previously-performed document reconstruction operations for a particular portion of the document to one of a plurality of profiles at a particular level in the hierarchical set of profiles in order to modify the set of document reconstruction operations to perform for the particular portion of the document, wherein the particular level of the hierarchy corresponds to the particular portion of the document. - View Dependent Claims (2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12)
-
-
13. A non-transitory machine readable medium storing a program which when executed by at least one processing unit reconstructs a document, the program comprising
a default set of document reconstruction operations for (i) identifying sets of primitive elements in an unstructured document that comprises a plurality of unassociated primitive elements and (ii) defining associations between the sets of primitive elements as structural elements in order to define a structured document from the unstructured document, wherein the primitive elements comprise at least one of glyphs and vector graphics; -
a hierarchical set of profiles, each particular profile comprising (i) a set of clauses specifying potential results of previously-performed document reconstruction operations and (ii) instructions for modifying the set of document reconstruction operations to perform when actual results of the previously-performed document reconstruction operations match the set of clauses for the particular profile in order to define a modified set of document reconstruction operations different from the default set of document reconstruction operations, the results of the previously-performed document reconstruction operations comprising the structural elements defined as associations between sets of primitive elements of the document by at least one of the performed document reconstruction operations, wherein instructions from a profile at a lower level in the hierarchical set of profiles override instructions from a profile at a higher level; and a module for matching results of the previously-performed document reconstruction operations for a particular portion of the document to one of a plurality of profiles at a particular level in the hierarchical set of profiles in order to modify the set of document reconstruction operations to perform for the particular portion of the document, wherein the particular level of the hierarchy corresponds to the particular portion of the document. - View Dependent Claims (14, 15, 16, 17, 18, 19)
-
-
20. A system comprising:
-
a set of processing units for executing sets of instructions; and a memory storing a program which when executed by at least one of the processing units reconstructs a document, the program comprising a default set of document reconstruction operations for (i) identifying sets of primitive elements in an unstructured document that comprises a plurality of unassociated primitive elements and (ii) defining associations between the sets of primitive elements as structural elements in order to define a structured document from the unstructured document, wherein the primitive elements comprise at least one of glyphs and vector graphics; a hierarchical set of profiles, each particular profile comprising (i) a set of clauses specifying potential results of previously-performed document reconstruction operations and (ii) instructions for modifying the set of document reconstruction operations to perform when actual results of the previously-performed document reconstruction operations match the set of clauses for the particular profile in order to define a modified set of document reconstruction operations different from the default set of document reconstruction operations, the results of the previously-performed document reconstruction operations comprising the structural elements defined as associations between sets of primitive elements of the document by at least one of the performed document reconstruction operations, wherein instructions from a profile at a lower level in the hierarchical set of profiles override instructions from a profile at a higher level; and a module for matching results of the previously-performed document reconstruction operations for a particular portion of the document to one of a plurality of profiles at a particular level in the hierarchical set of profiles in order to modify the set of document reconstruction operations to perform for the particular portion of the document, wherein the particular level of the hierarchy corresponds to the particular portion of the document. - View Dependent Claims (21, 22, 23, 24, 25)
-
Specification