Intellegent Data Search Engine
First Claim
1. A computer-implemented method for reverse engineering a first database in a second database from product information displayed on a remote web site, comprising:
- receiving a first remote web site including a plurality of web pages related to one or more products;
automatically identifying a first cluster of web pages associated with the first remote web site that comprises product information from a first database, the product information displayed as data field values (DFVs) related to the one or more products and located in the same relative order in each web page of the first cluster of web pages;
automatically deriving a first extraction template from a first sample of web pages within the first cluster of web pages;
inferring data field names (DFNs) associated with the DFVs; and
populating a second database, including extracting the DFVs from the first cluster of web pages and storing the DFVs in association with an inferred DFN.
1 Assignment
0 Petitions
Accused Products
Abstract
Automatically extracting information that matches a predetermined criterion from one or more web pages at one or more web sites and automatically producing one or more extracted data-field names from the information extracted from the one or more web pages at the one or more web sites. The extracted information includes at least one extracted data-field value associated with one of the one or more extracted data-field names. If one of the extracted data-field names matches an existing data-field name in a previously constructed data base including one or more data fields each associated with a data-field name and a data-field value, the method updates an extracted data-field value associated with the data-field name in the data base. If one of the extracted data field names does not match any of the existing data-field names in the data base, the method adds the extracted data-field name to the data base.
29 Citations
22 Claims
-
1. A computer-implemented method for reverse engineering a first database in a second database from product information displayed on a remote web site, comprising:
-
receiving a first remote web site including a plurality of web pages related to one or more products; automatically identifying a first cluster of web pages associated with the first remote web site that comprises product information from a first database, the product information displayed as data field values (DFVs) related to the one or more products and located in the same relative order in each web page of the first cluster of web pages; automatically deriving a first extraction template from a first sample of web pages within the first cluster of web pages; inferring data field names (DFNs) associated with the DFVs; and populating a second database, including extracting the DFVs from the first cluster of web pages and storing the DFVs in association with an inferred DFN. - View Dependent Claims (2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20)
-
-
21. A system to reverse engineer a first database in a second database from product information displayed on a remote web site, comprising:
-
a first module to receive a first remote web site including a plurality of web pages related to one or more products; a second module, coupled in communication with the first module, the second module automatically identifying a first cluster of web pages associated with the first remote web site that comprise product information from a first database; a third module, coupled in communication with the second module, the third module automatically deriving a first extraction template from a first sample of web pages within the first cluster of web pages, wherein the web pages comprise product information displayed as data field values (DFVs) related to the one or more products and located in the same relative positions across the first cluster of web pages; a fourth module, coupled in communication with the third module, the fourth module inferring data field names (DFNs) of the DFVs; and a fifth module, coupled in communication with the fourth module, the fifth module populating a second database, including extracting the DFVs from the first cluster of web pages and storing the DFVs in association with an inferred DFN.
-
-
22. A computer-implemented method for reverse engineering a plurality of databases in an aggregate database from product information displayed on a plurality of remote web sites, comprising:
-
receiving a plurality of remote web sites including a plurality of web pages related to a plurality of products; automatically identifying a plurality of clusters of web pages, each cluster of web pages associated with at least one of the plurality of remote web sites, each remote web site comprising product information from at least one of the plurality of databases, the product information displayed as data field values (DFVs) related at least one of the one of the plurality of products and located in the same relative order in each web page of an identified cluster of web pages; automatically deriving a plurality of extraction templates, each extraction template derived from a sample of web pages within a cluster of web pages form the plurality of clusters of web pages; inferring data field names (DFNs) associated with the DFVs; and populating an aggregation database, including extracting the DFVs from the plurality of clusters of web pages and storing the DFVs in association with an inferred DFN.
-
Specification