Rules-based identification of items represented on web pages
First Claim
1. A method of identifying an item represented on a web page retrieved by a client computer, the method comprising, on the client computer:
- sending a request to a server, wherein the request specifies at least a portion of an electronic address from which the web page was retrieved;
receiving a rule in response to the request, wherein the rule is associated with a plurality of web pages that share a common web page structure, and is capable of using the common web page structure to extract information from said web pages;
applying the rule to the web page, wherein applying the rule comprises, on the client computer, traversing a structural graph representation of the web page, according to a path specified by the rule, to obtain access to a node of the structural graph representation;
extracting from the web page, based at least upon access to the node, item-identifying data descriptive of the item represented on the web page;
using the item-identifying data extracted from the web page to retrieve, over a network, supplemental information about the item; and
displaying said supplemental information on the client computer in conjunction with a display of the web page; and
wherein the supplemental information comprises information previously extracted from a second web page via application of a second rule to the second web page, said second web page including information about said item.
1 Assignment
0 Petitions
Accused Products
Abstract
The Document Object Model (DOM) of a sampled web page is used to create a rule that extracts item-related data from web pages having a similar DOM structure to the sampled web page. In response to a user request for such a web page, the rule is retrieved from a data server based on the page'"'"'s URL, and is applied to the DOM of the web page to extract item-identifying data. The item is then identified—preferably by the data server—by matching the item-identifying data to an item in a database. Supplemental information about the item is then retrieved from the database and supplied to the user'"'"'s computer for viewing in conjunction with the requested page. In a preferred embodiment, the rule is retrieved from the data server and applied to the web page by a client application that runs in conjunction with a web browser.
177 Citations
14 Claims
-
1. A method of identifying an item represented on a web page retrieved by a client computer, the method comprising, on the client computer:
-
sending a request to a server, wherein the request specifies at least a portion of an electronic address from which the web page was retrieved; receiving a rule in response to the request, wherein the rule is associated with a plurality of web pages that share a common web page structure, and is capable of using the common web page structure to extract information from said web pages; applying the rule to the web page, wherein applying the rule comprises, on the client computer, traversing a structural graph representation of the web page, according to a path specified by the rule, to obtain access to a node of the structural graph representation; extracting from the web page, based at least upon access to the node, item-identifying data descriptive of the item represented on the web page; using the item-identifying data extracted from the web page to retrieve, over a network, supplemental information about the item; and displaying said supplemental information on the client computer in conjunction with a display of the web page; and
wherein the supplemental information comprises information previously extracted from a second web page via application of a second rule to the second web page, said second web page including information about said item. - View Dependent Claims (2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14)
-
Specification