Method for automatic wrapper repair
First Claim
1. A method of information extraction from a Web page using an initial wrapper which has become partially inoperative, comprising:
- wherein the initial wrapper comprises an initial set of rules for extracting information and for assigning labels from a wrapper set of labels to the extracted information;
extracting strings from the Web page parsed in forward direction using the initial set of rules;
analyzing the extracted strings according to the initial set of rules for assigning labels associated with the wrapper;
assigning labels to those strings which satisfy the label rules;
extracting strings from the Web page in backward/(opposite) direction using the initial set of rules;
analyzing the extracted strings according to the set of rules for assigning labels associated with the wrappers; and
assigning labels to those unlabeled strings from which satisfy the label rules.
0 Assignments
0 Petitions
Accused Products
Abstract
A method of information extraction from a Web page using an initial wrapper which has become partially inoperative, wherein the initial wrapper comprises an initial set of rules for extracting information and for assigning labels from a wrapper set of labels to the extracted information, includes using the initial set of rules to extract strings from the Web page parsed in forward direction; analyzing the extracted strings according to the initial set of rules for assigning labels associated with the wrapper; assigning labels to those strings which satisfy the label rules; using the initial set of rules to extract strings from the Web page in backward/(opposite) direction; analyzing the extracted strings according to the set of rules for assigning labels associated with the wrappers; and assigning labels to those unlabeled strings from which satisfy the label rules.
28 Citations
5 Claims
-
1. A method of information extraction from a Web page using an initial wrapper which has become partially inoperative, comprising:
-
wherein the initial wrapper comprises an initial set of rules for extracting information and for assigning labels from a wrapper set of labels to the extracted information;
extracting strings from the Web page parsed in forward direction using the initial set of rules;
analyzing the extracted strings according to the initial set of rules for assigning labels associated with the wrapper;
assigning labels to those strings which satisfy the label rules;
extracting strings from the Web page in backward/(opposite) direction using the initial set of rules;
analyzing the extracted strings according to the set of rules for assigning labels associated with the wrappers; and
assigning labels to those unlabeled strings from which satisfy the label rules. - View Dependent Claims (2, 3, 4, 5)
-
Specification