×

Document alignment systems for legacy document conversions

  • US 7,882,119 B2
  • Filed: 12/22/2005
  • Issued: 02/01/2011
  • Est. Priority Date: 12/22/2005
  • Status: Expired due to Fees
First Claim
Patent Images

1. A document alignment method comprising:

  • inputting source leaves of a source document in first tree structured format, the first tree structured format comprising nodes which are ultimately connected with the source leaves by paths, text content of the source document being distributed among the source leaves;

    inputting target leaves of a target document in second tree structured format, the second tree structured format comprising nodes which are ultimately connected with the target leaves by paths, text content of the target document being distributed among the target leaves;

    assigning a cost to each of a plurality of matches based on text content of the leaves, each match comprising elements selected from the group consisting of;

    a source leaf and a target leaf,an unmatched source leaf, andan unmatched target leaf;

    identifying a set of matches for which a total cost is minimal, wherein each of the input source and target leaves is in at least one of the identified matches;

    identifying, from the set of identified matches, groups of matches wherein each match in the group has a leaf in common;

    identifying, from the groups, probable matches in which more than one target leaf is matched with at least one source leaf and probable matches where more than one source leaf is matched with a target leaf;

    outputting an alignment between leaves of the target document and leaves of the source document which includes the probable matches.

View all claims
  • 1 Assignment
Timeline View
Assignment View
    ×
    ×