×

Locating parallel word sequences in electronic documents

  • US 8,560,297 B2
  • Filed: 06/07/2010
  • Issued: 10/15/2013
  • Est. Priority Date: 06/07/2010
  • Status: Active Grant
First Claim
Patent Images

1. A method comprising the following computer-executable acts:

  • receiving a first electronic document, wherein the first electronic document comprises a first set of word sequences;

    receiving a second electronic document, wherein the second electronic document comprises a second set of word sequences, wherein a word sequence pair comprises a word sequence from the first set of word sequences and a word sequence from the second set of word sequences or an empty word sequence, and wherein the second document comprises a hyperlink to the first document;

    automatically correlating the first electronic document and the second electronic document based at least in part upon the hyperlink;

    assigning a respective label to each word sequence pair to generate a plurality of possible alignments of word sequences in the first set of word sequences with respect to word sequences in the second set of word sequences;

    assigning respective scores to a plurality of different alignments, wherein a score is based at least in part upon a plurality of features comprising;

    a first distortion feature that indicates a difference between a position of a previously aligned word sequence and a currently aligned word sequence with respect to at least one word sequence in the first set of word sequences and the respective word sequences in the second set of word sequences; and

    a second distortion feature that is indicative of a difference between;

    an actual position of the currently aligned word sequence in the second electronic document relative to the previously aligned word sequence in the second electronic document; and

    an expected position of the currently aligned word sequence in the second electronic document, the expected position being adjacent to the previously aligned word sequence; and

    causing a highest score assigned to an alignment amongst all scores assigned to the plurality of different alignments to be stored in a data repository, wherein the score is indicative of an amount of parallelism between word sequences aligned in the alignment.

View all claims
  • 2 Assignments
Timeline View
Assignment View
    ×
    ×