RULES-BASED EXTRACTION OF DATA FROM WEB PAGES
0 Assignments
0 Petitions
Accused Products
Abstract
A rule creation application uses a reference web page, and user input regarding information displayed thereon, to generate a rule for extracting such information from the web page. The rule uses a structured graph representation of the web page, such as the page'"'"'s Document Object Model (DOM), to extract the information. In addition to being applicable to the reference web page, the rule may be used to extract information from other web pages that have a similar structure.
-
Citations
45 Claims
-
1-20. -20. (canceled)
-
21. A computer program comprising executable instructions represented in computer storage, said computer program adapted to be executed on a computer, and being capable of causing the computer to:
-
receive user input specifying a selected data element of a web page displayed on the computer;
identify, in a structural graph representation of said web page, a node that corresponds to the selected data element;
identify a path to said node, said path specifying how to traverse the structural graph representation of the web page to reach the node; and
generate a rule that is adapted to be applied to the web page, and to other web pages of similar structure, to extract web page data corresponding to the selected data element, the rule specifying said path. - View Dependent Claims (22, 23, 24, 25, 26, 27, 28, 29, 30, 31)
-
-
32. A computer-implemented method, comprising:
-
receiving user input specifying selected content of a web page displayed on a computer;
identifying, in a structural graph representation of said web page, a node that corresponds to the selected content;
identifying a path to said node, said path specifying how the structural graph representation of the web page is traversable to reach the node; and
generating a rule that is adapted to be applied to the web page, and to other web pages of similar structure, to extract data corresponding to said selected content, said rule specifying said path. - View Dependent Claims (33, 34, 35, 36, 37, 38, 39, 40, 41, 42)
-
-
43. A computer system, comprising:
-
a rule generation component that runs on a user computer in conjunction with a browser program, the rule generation component configured to receive user input specifying a data element of a web page displayed on the user computer by the browser program, and to use a Document Object Model (DOM) representation of the web page to generate a rule for extracting the data element from the web page; and
a server system that communicates with the rule generation component over a network, and stores rules generated by the rule generation component in association with web site addresses to which such rules correspond. - View Dependent Claims (44, 45)
-
Specification