×

Method for automatic wrapper repair

  • US 7,440,974 B2
  • Filed: 12/05/2005
  • Issued: 10/21/2008
  • Est. Priority Date: 07/18/2002
  • Status: Expired due to Fees
First Claim
Patent Images

1. A method of information extraction from a Web page using an initial wrapper which has become partially inoperative, comprising:

  • wherein the initial wrapper comprises an initial set of rules for extracting information and for assigning labels from a wrapper set of labels to the extracted information;

    extracting strings from the Web page parsed in forward direction using the initial set of rules;

    analyzing the extracted strings according to the initial set of rules for assigning labels associated with the wrapper;

    assigning labels to those strings which satisfy the label rules;

    extracting strings from the Web page in backward/(opposite) direction using the initial set of rules;

    analyzing the extracted strings according to the set of rules for assigning labels associated with the wrappers; and

    assigning labels to those unlabeled strings from which satisfy the label rules;

    wherein the initial wrapper W comprises a triple (T, L, R), where T is an input tokenizer, L is a semantic label set and R is a set of extraction rules R={n}, where each rule n, is a triple (p,s,l), where p ε

    Sl and s ε

    Su are prefix and suffix, and ε

    L.

View all claims
  • 0 Assignments
Timeline View
Assignment View
    ×
    ×