METHOD AND SYSTEM FOR AN AGGREGATE WEB SITE SEARCH DATABASE
First Claim
1. A method of aggregating web site data from one or more web sites, the method comprising:
- sending a page request to a web site selected from the one or more web sites;
receiving the requested web page from the selected web site;
retrieving signature schema associated with the requested web page wherein the signature scheme identifies data fields within the requested web page;
applying signature schema to the requested web page to extract data from the requested web page; and
storing extracted data to an aggregate database, wherein the aggregate database comprises data extracted from the one or more web sites.
4 Assignments
0 Petitions
Accused Products
Abstract
Signature schema documents may be pre-defined using a query language to provide instructions for application by an engine to extract data from web pages of respective web sites. For a particular web page, signature schema instructions identify a web page family for the web page and extract desired data from the web page in accordance with its web page family. The instructions use signatures previously identified within web pages of the same family to distinguish the web page family from others of the web site and to distinguish the desired data from other data for the web page family. A server may make one or more requests to obtain web pages from various web sites and apply respective signature schemas maintained in a repository coupled to the engine. Extracted data can be stored to an aggregate database.
49 Citations
25 Claims
-
1. A method of aggregating web site data from one or more web sites, the method comprising:
-
sending a page request to a web site selected from the one or more web sites; receiving the requested web page from the selected web site; retrieving signature schema associated with the requested web page wherein the signature scheme identifies data fields within the requested web page; applying signature schema to the requested web page to extract data from the requested web page; and storing extracted data to an aggregate database, wherein the aggregate database comprises data extracted from the one or more web sites. - View Dependent Claims (2, 3, 4, 5, 6, 7, 8, 9, 10, 11)
-
-
12. A system for aggregating web site data from one or more web sites, the system comprising:
at least one computing device comprising a processor and a memory coupled thereto, said memory storing instructions and data for configuring the processor to; send a page request to a web site selected from the one or more web sites; receive a web page from the selected web site based upon the sent page request; retrieve signature schema associated with the requested web page; apply signature schema to the requested web page data to extract data identified by the signature schema; and store extracted data to an aggregate database comprising data extracted from the one ore more web sites. - View Dependent Claims (13, 14, 15, 16, 17, 18, 19, 20, 21, 22)
-
23. A computer program product storing computer readable instructions which when executed by a computer processor configure the processor for:
-
sending a page request to a web site selected from the one or more web sites; receiving the requested web page from the selected web site; retrieving signature schema associated with the requested web page wherein the signature scheme identifies data fields within the requested web page; applying signature schema to the requested web page to extract data from the requested web page; and storing extracted data to an aggregate database, wherein the aggregate database comprises data extracted from the one or more web sites.
-
-
24. A method of aggregating web site data from one or more web sites, the method comprising:
-
sending a page request to a web site selected from the one or more web sites; receiving the requested web page from the selected web site; retrieving signature schema associated with the requested web page wherein the signature scheme identifies data fields within the web and wherein the signature schema are extensible Markup Language (XML) documents comprising query language for extracting data from the requested web page; applying signature schema to the received web page to extract data from the requested web page; storing extracted data to an aggregate database, wherein the aggregate database comprises data extracted from the one or more web sites; receiving a search query from a client machine for data stored in the aggregate database; generating a database query based upon the received search query; and retrieving data from the aggregate database defined by the query.
-
-
25. The method of claim 25 where in the client machine is a wireless device.
Specification