Rules-based identification of items represented on web pages
First Claim
1. A method of identifying a product represented on a web page, the method comprising:
- (a) traversing a path through a structured graph representation of the web page to a node of the representation;
(b) obtaining an element of product-identifying data stored in association with the node; and
(c) identifying the product based at least upon the element of product-identifying data.
1 Assignment
0 Petitions
Accused Products
Abstract
The Document Object Model (DOM) of a sampled web page is used to create a rule that extracts item-related data from web pages having a similar DOM structure to the sampled web page. In response to a user request for such a web page, the rule is retrieved from a data server based on the page'"'"'s URL, and is applied to the DOM of the web page to extract item-identifying data. The item is then identified—preferably by the data server—by matching the item-identifying data to an item in a database. Supplemental information about the item is then retrieved from the database and supplied to the user'"'"'s computer for viewing in conjunction with the requested page. In a preferred embodiment, the rule is retrieved from the data server and applied to the web page by a client application that runs in conjunction with a web browser.
-
Citations
47 Claims
-
1. A method of identifying a product represented on a web page, the method comprising:
-
(a) traversing a path through a structured graph representation of the web page to a node of the representation;
(b) obtaining an element of product-identifying data stored in association with the node; and
(c) identifying the product based at least upon the element of product-identifying data. - View Dependent Claims (2, 3, 4, 5, 6)
-
-
7. A method of identifying an item represented on a web page, the method comprising:
-
(a) receiving an identification of a portion of a first web page, wherein the portion comprises item-identifying data descriptive of a first item represented on the first web page;
(b) identifying a first node of a structured graph representation of the first web page based at least upon the identified portion;
(c) creating a path through the structured graph representation of the first web page to the first node;
(d) traversing the path through a structured graph representation of a second web page to obtain access to a second node; and
(e) obtaining, based at least upon access to the second node, item-identifying data descriptive of a second item represented on the second web page. - View Dependent Claims (8, 9, 10, 11, 12, 13, 15, 16, 17, 18, 19, 20)
-
-
14. A method of collecting and providing supplemental information about items represented on web pages, the method comprising:
-
(a) applying a first rule to a first web page to obtain information related to an item represented on the first web page;
(b) storing the information in association with data descriptive of the item;
(c) subsequent to (b), applying a second rule to a second web page to obtain item-identifying data, wherein the second web page comprises a representation of the item, and wherein the item-identifying data are descriptive of the item; and
(d) retrieving the stored information based at least upon the obtained item-identifying data.
-
-
21. A method of identifying an item represented on a web page, the method comprising, on a client computer:
-
providing a structural graph representation of the web page;
sending a request to a data server, wherein the request comprises at least a portion of a location from which the web page was retrieved;
receiving a rule in response to the request, wherein the rule comprises at least one path, wherein each path is configured to be applied to the structural graph representation;
for each path, traversing the path to obtain access to a node of the structural graph representation; and
for at least one of the nodes, obtaining, based at least upon access to the node, item-identifying data descriptive of the item. - View Dependent Claims (22, 23, 24, 25, 26, 27, 28, 30, 31, 32, 33, 34, 36, 37, 38, 39, 40, 41, 42, 43, 44, 45, 46, 47)
-
-
29. A method of creating a rule for extracting item-identifying data from web pages, the method comprising:
-
(a) receiving an identification of a portion of a web page located at a location, wherein the portion comprises item-identifying data descriptive of an item represented on the web page;
(b) identifying a node of a structured graph representation of the web page based at least upon the identified portion;
(c) creating a path through the structured graph representation of the web page to the first node;
(d) creating a rule based at least upon the path; and
(e) associating the rule with at least a portion of the location of the web page.
-
-
35. A system for identifying items represented on web pages, the system comprising:
-
a first module configured to store a plurality of rules, wherein each rule comprises at least one structured graph representation path, and wherein each rule is associated with a plurality of web page locations;
a second module configured to store item-identifying data for each of a plurality of items;
a third module configured to retrieve at least one of the plurality of rules based at least upon a first web page location;
a fourth module configured to apply the at least one path of the retrieved rule to a structured graph representation of a web page located at the first web page location to extract item-identifying data from the web page;
a fifth module configured to provide the extracted item-identifying-data to the second module; and
a sixth module configured to match at least a portion of the extracted item-identifying data to item-identifying data stored by the second module to thereby identify an item.
-
Specification