×

Method and apparatus for extracting relevant data

  • US 7,103,838 B1
  • Filed: 04/04/2002
  • Issued: 09/05/2006
  • Est. Priority Date: 08/18/2000
  • Status: Active Grant
First Claim
Patent Images

1. A method of extraction, comprising:

  • accessing at least a first set of data of a first document, the first document including markup language, wherein the first set of data includes a first selected subset and a second selected subset, such that the second selected subset of data is a subset of the first selected subset of data, the first selected subset at least partly specifying document data, the second selected subset at least partly specifying document data;

    accessing at least a second set of data of a second document, the second document including markup language;

    determining a first edit sequence between at least part of the first set of data and at least part of the second set of data, the first edit sequence including any of insertions, deletions, substitutions, matches, and repetitions, including;

    considering at least repetitions for inclusion in the first edit sequence between at least part of the first set of data and at least part of the second set of data;

    finding a first corresponding subset of the second set of data, the first corresponding subset having a correspondence to the first selected subset, the correspondence at least partly found by determining the first edit sequence;

    determining a second edit sequence between at least part of the first set of data and at least part of the second set of data, the first set of data including at least part of the first selected subset, the second set of data including at least part of the first corresponding subset, the second edit sequence including any of insertions, deletions, substitutions, matches, and repetitions, including;

    considering at least repetitions for inclusion in the second edit sequence between at least part of the first set of data and at least part of the second set of data, the first set of data including at least part of the first selected subset; and

    finding a second corresponding subset of the second set of data, the second corresponding subset having a correspondence to the second selected subset, the correspondence at least partly found by determining the second edit sequence;

    wherein subsequent sets of data of documents are received, the documents including markup language, document data of the subsequent sets of data are determined by finding corresponding data of the subsequent sets of data, the corresponding data of the subsequent sets correspond to the selected data of earlier sets of data, the corresponding data of the subsequent sets are identified as selected data of the subsequent sets of data, the selected data of the subsequent sets of data at least partly specifying document data, and at least one of selected data of the earlier sets and the selected data of the subsequent data at least partly determine corresponding data of later sets of data, the earlier sets of data are received earlier than the subsequent sets of data, and the later sets of data are received later than the subsequent sets of data.

View all claims
  • 9 Assignments
Timeline View
Assignment View
    ×
    ×