Systems and methods for converting legacy and proprietary documents into extended mark-up language format
First Claim
1. A method for converting a legacy or proprietary document into extensible mark-up language format, comprising:
- inputting a document having a proprietary format;
converting the proprietary format document to a document having a standard representation format;
preparing a structured representation of the standard representation format document with an exemplary source document schema;
determining an exemplary target document schema;
preparing a structured representation conforming with the exemplary target document schema;
annotating a subset of the standard representation format document to define exemplary target structured representations that satisfy the exemplary target document schema;
decomposing a two-dimensional representation of the source document schema using multiple one-dimensional methods;
developing translation rules to instruct a parser to visit the two-dimensional representation of the source document schema to group and/or nest labeled elements in an exemplary output document schema;
preparing a source document schema structured representation of the standard representation format source document;
applying the translation rules to visit the two-dimensional representation of the source document schema and group and/or nest labeled elements in an output document in a target structured representation format; and
converting the target structured representation format document to an output target representation document format.
3 Assignments
0 Petitions
Accused Products
Abstract
A system and method that converts legacy and proprietary documents into extended mark-up language format which treats the conversion as transforming ordered trees of one schema and/or model into ordered trees of another schema and/or model. In embodiments, the tree transformers are coded using a learning method that decomposes the converting task into three components which include path re-labeling, structural composition and input tree traversal, each of which involves learning approaches. The transformation of an input tree into an output tree may involve labeling components in the input tree with valid labels or paths from a particular output schema, composing the labeled elements into the output tree with a valid structure, and finding such a traversal of the input tree that achieves the correct composition of the output tree and applies structural rules.
-
Citations
3 Claims
-
1. A method for converting a legacy or proprietary document into extensible mark-up language format, comprising:
-
inputting a document having a proprietary format; converting the proprietary format document to a document having a standard representation format; preparing a structured representation of the standard representation format document with an exemplary source document schema; determining an exemplary target document schema; preparing a structured representation conforming with the exemplary target document schema; annotating a subset of the standard representation format document to define exemplary target structured representations that satisfy the exemplary target document schema; decomposing a two-dimensional representation of the source document schema using multiple one-dimensional methods; developing translation rules to instruct a parser to visit the two-dimensional representation of the source document schema to group and/or nest labeled elements in an exemplary output document schema; preparing a source document schema structured representation of the standard representation format source document; applying the translation rules to visit the two-dimensional representation of the source document schema and group and/or nest labeled elements in an output document in a target structured representation format; and converting the target structured representation format document to an output target representation document format. - View Dependent Claims (2, 3)
-
Specification