ROBUST WRAPPERS FOR WEB EXTRACTION
First Claim
1. A computer-implemented method to determine a robust wrapper representing a data item of a plurality of data items in a document represented by a markup language, comprising:
- based on archival data representative of a temporal history of the document, developing a model indicative of the temporal history;
based on the developed model, determining robustness characteristics for a plurality of different wrappers representing associated paths to the data item in a representation of the document;
based on a result of the determining operation, providing, as a result wrapper, one of the plurality of wrappers that has a desired robustness characteristic.
9 Assignments
0 Petitions
Accused Products
Abstract
A computer-implemented method to determine a robust wrapper includes developing a model indicative of the temporal history of a document, such as a web document written in a markup language. Based on the developed model, robustness characteristics are determined for a plurality of different wrappers representing associated paths to the data item in a representation of the document. Based on a result of the determining operation, a result wrapper of the plurality of wrappers is provided. The result wrapper has a desired robustness characteristic.
-
Citations
19 Claims
-
1. A computer-implemented method to determine a robust wrapper representing a data item of a plurality of data items in a document represented by a markup language, comprising:
-
based on archival data representative of a temporal history of the document, developing a model indicative of the temporal history; based on the developed model, determining robustness characteristics for a plurality of different wrappers representing associated paths to the data item in a representation of the document; based on a result of the determining operation, providing, as a result wrapper, one of the plurality of wrappers that has a desired robustness characteristic. - View Dependent Claims (2, 3, 4, 5, 6, 7, 8, 9)
-
-
10. A computing system for determining a robust wrapper representing a data item of a plurality of data items in a document represented by a markup language, wherein the computing system is operable to:
-
based on archival data representative of a temporal history of the document, develop a model indicative of the temporal history; based on the developed model, determine robustness characteristics for a plurality of different wrappers representing associated paths to the data item in a representation of the document; and based on a result from the determination of the robustness characteristics, provide, as a result wrapper, one of the plurality of wrappers that has a desired robustness characteristic. - View Dependent Claims (11, 12, 13, 14)
-
-
15. A computer readable medium embodied in a tangible form including executable computer program code operable to determine a robust wrapper representing a data item of a plurality of data items in a document represented by a markup language, wherein the computer readable medium includes:
-
executable computer code operable to develop, based on archival data representative of a temporal history of the document, a model indicative of the temporal history; executable computer code operable to determine, based on the developed model, robustness characteristics for a plurality of different wrappers representing associated paths to the data item in a representation of the document; and executable computer code operable to provide, based on a result of the determining operation and as a result wrapper, one of the plurality of wrappers that has a desired robustness characteristic. - View Dependent Claims (16, 17, 18, 19)
-
Specification