Document alignment systems for legacy document conversions
First Claim
1. A document alignment method comprising:
- inputting source leaves of a source document in first tree structured format;
inputting target leaves of a target document in second tree structured format;
assigning a cost to each of a plurality of matches, each match comprising a pair of elements selected from the group consisting of a source leaf and a target leaf, an unmatched source leaf, and an unmatched target leaf;
identifying matches for which a total cost is minimal, wherein each of the leaves is in at least one of the identified matches;
identifying, from the identified matches, groups of matches wherein each match in the group has a leaf in common;
identifying, from the groups, probable matches in which more that one target leaf is matched with at least one source leaf and probable matches where more than one source leaf is matched with a target leaf;
outputting an alignment between leaves of the target document and leaves of the source document which includes the probable matches.
1 Assignment
0 Petitions
Accused Products
Abstract
A method for aligning documents which may be in different XML formats includes inputting source and target leaves of a source and documents in first and second tree structured formats and assigning a cost to each of a plurality of matches. Each match may include a source leaf and a target leaf or be an unmatched source or target leaf. Matches are identified for which a total cost is minimal, wherein each of the leaves is in at least one of the identified matches. From the identified matches, groups of two or more matches are identified which have a leaf in common. From the groups, probable matches are identified in which more that one target leaf is matched with at least one source leaf or more than one source leaf is matched with a target leaf. An alignment between leaves of the target document and leaves of the source document is output which includes the probable matches.
-
Citations
22 Claims
-
1. A document alignment method comprising:
-
inputting source leaves of a source document in first tree structured format;
inputting target leaves of a target document in second tree structured format;
assigning a cost to each of a plurality of matches, each match comprising a pair of elements selected from the group consisting of a source leaf and a target leaf, an unmatched source leaf, and an unmatched target leaf;
identifying matches for which a total cost is minimal, wherein each of the leaves is in at least one of the identified matches;
identifying, from the identified matches, groups of matches wherein each match in the group has a leaf in common;
identifying, from the groups, probable matches in which more that one target leaf is matched with at least one source leaf and probable matches where more than one source leaf is matched with a target leaf;
outputting an alignment between leaves of the target document and leaves of the source document which includes the probable matches. - View Dependent Claims (2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17)
-
-
18. A method of alignment comprising:
-
inputting a matrix with cost values which are a function of a measure of similarity for the content of pairs of leaves, the pairs of leaves each including a leaf of a source document and a leaf of a target document, each of the leaves including document content;
computing a minimum edit distance for the matrix based on the input cost values, whereby each leaf from the source document is aligned with a leaf of the target document or with no leaf of the target document and each leaf from the target document is aligned with a leaf of the source document or with no leaf of the source document; and
from the matrix alignments, identifying candidate matches in which a leaf of at least one of the source and target documents matches a combination of leaves of the other of the source and target documents;
refining the candidate matches to identify probable matches; and
outputting an alignment of the leaves of the first document with the leaves of the second document which includes matches of at least some of the leaves of the first document with at least some of the leaves of the second document. - View Dependent Claims (19)
-
-
20. A document alignment apparatus comprising:
-
an input device for inputting source leaves of a source document in first tree structured format and inputting target leaves of a target document in second tree structured format;
memory for storing the input source and target leaves;
a processing module which assigns a cost to each of a plurality of matches, each match comprising a pair of elements selected from the group consisting of a source leaf and a target leaf, an unmatched source leaf, and an unmatched target leaf;
a processing module which identifies matches for which a total cost is minimal, wherein each of the leaves is in at least one of the identified matches;
a processing module which identifies, from the identified matches, groups of matches wherein each match in the group has a leaf in common;
a processing module which identifies, from the groups, probable matches in which more that one target leaf is matched with at least one source leaf and probable matches where more than one source leaf is matched with a target leaf; and
an output device for outputting an alignment between leaves of the target document and leaves of the source document which includes the identified probable matches. - View Dependent Claims (21, 22)
-
Specification