Method and expert system for deducing document structure in document conversion
First Claim
1. An expert system for more efficiently and accurately deducing document structure from document formatting, the expert system comprising:
- a conversion engine for converting an unstructured file to a structured file, the conversion engine configured to locate document formatting including frequency of usage, repetitions and locations of text, spacing of text and style of text in the unstructured file to initially deduce document structure from the document formatting; and
a verification engine, responsive to the output of the conversion engine, for generating and displaying a visual representation file of the structured file annotated with visual depictions of the classified components of the structured file on a display device so that the annotations with the visual depictions of the classified components can be modified, classifications of the components can be added and classifications of the components can be suggested by an example, andthe structured file reprocessed by the conversion engine which to further deduce the document structure, uses the initially deduced document structure, the annotations that are modified, the classifications that are added, the classifications that are suggested, a rule that is derived from the examples provided via the verification engine and all occurrences in the structured file that match the derived rule, the conversion engine and the verification engine operating iteratively until an operator indicates the structured file annotated with visual depictions is correct.
6 Assignments
0 Petitions
Accused Products
Abstract
An expert system for more efficiently and accurately deducing document structure from document formatting, the expert system including a conversion engine for converting an unstructured file to a structured file, and a verification engine, responsive to the output of the conversion engine, for generating and displaying a representation of the structured file annotated with a visual depictions of the classified components thereof so that the annotations can be modified and/or classifications can be added and/or classifications can be suggested, and/or rules for classification can be suggested and the structured file reprocessed by the conversion engine.
67 Citations
56 Claims
-
1. An expert system for more efficiently and accurately deducing document structure from document formatting, the expert system comprising:
-
a conversion engine for converting an unstructured file to a structured file, the conversion engine configured to locate document formatting including frequency of usage, repetitions and locations of text, spacing of text and style of text in the unstructured file to initially deduce document structure from the document formatting; and a verification engine, responsive to the output of the conversion engine, for generating and displaying a visual representation file of the structured file annotated with visual depictions of the classified components of the structured file on a display device so that the annotations with the visual depictions of the classified components can be modified, classifications of the components can be added and classifications of the components can be suggested by an example, and the structured file reprocessed by the conversion engine which to further deduce the document structure, uses the initially deduced document structure, the annotations that are modified, the classifications that are added, the classifications that are suggested, a rule that is derived from the examples provided via the verification engine and all occurrences in the structured file that match the derived rule, the conversion engine and the verification engine operating iteratively until an operator indicates the structured file annotated with visual depictions is correct. - View Dependent Claims (2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28, 29, 30, 31, 32, 33, 34, 35, 36, 37, 38, 39, 40, 41, 42, 43, 44, 45, 46)
-
-
47. A method for more efficiently and accurately deducing document structure from document formatting, the method comprising:
-
a) converting an unstructured file to a structured file wherein the components of the structured file are classified, the step of converting including locating document formatting that include frequency of usage, repetitions and locations of text, spacing of text and style of text in the unstructured file to initially deduce document structure from the document formatting; b) generating and displaying a visual representation file of the structured file annotated with visual depictions of the classified components thereof; c) at least one of the steps of modifying the annotated visual depictions, adding classifications of the components, and suggesting classifications of the components by an example associated with step a); and d) repeating steps a), b), and c) until no further modifications, additions or suggestions are made, step a) further including deducing the document structure by using the initially deduced document structure, the annotated visual depictions that are modified, the classifications that are added, the classifications that are suggested, a rule that is derived from the example, and all occurrences that match the derived rule. - View Dependent Claims (48, 49, 50, 51, 52, 53, 54, 55, 56)
-
Specification