Intellegent data search engine
First Claim
1. A computer-implemented method for extracting product information displayed on a plurality of web pages on a first remote web site representing a first database and storing said product information in a second database, comprising:
- requesting and, in response to said request, receiving said plurality of web pages related to one or more products from said first remote website;
automatically identifying a first cluster of web pages associated with said first remote web site that comprises product information from said first database, said product information displayed as data field values (DFVs) related to the one or more products and exhibiting common identification characteristics in each web page of said first cluster of web pages;
structurally comparing a first sample of web pages within said first cluster of web pages and creating an intersection data structure comprising the structural location of said DFVs of database records (DBRs), and inferring data field names (DFNs) associated with keywords, symbols, or patterns, indicating the location of said DFVs in said intersection data structure;
automatically deriving a first extraction template from said intersection data structure associated with said first sample of web pages within said first cluster of web pages;
utilizing said first extraction template for extracting said DFVs from said first cluster of web pages to create, in association with said inferred DFNs, extracted DBRs;
and storing said extracted DBRs in said second database.
1 Assignment
0 Petitions
Accused Products
Abstract
Automatically extracting information that matches a predetermined criterion from one or more web pages at one or more web sites and automatically producing one or more extracted data-field names from the information extracted from the one or more web pages at the one or more web sites. The extracted information includes at least one extracted data-field value associated with one of the one or more extracted data-field names. If one of the extracted data-field names matches an existing data-field name in a previously constructed data base including one or more data fields each associated with a data-field name and a data-field value, the method updates an extracted data-field value associated with the data-field name in the data base. If one of the extracted data field names does not match any of the existing data-field names in the data base, the method adds the extracted data-field name to the data base.
-
Citations
17 Claims
-
1. A computer-implemented method for extracting product information displayed on a plurality of web pages on a first remote web site representing a first database and storing said product information in a second database, comprising:
-
requesting and, in response to said request, receiving said plurality of web pages related to one or more products from said first remote website; automatically identifying a first cluster of web pages associated with said first remote web site that comprises product information from said first database, said product information displayed as data field values (DFVs) related to the one or more products and exhibiting common identification characteristics in each web page of said first cluster of web pages; structurally comparing a first sample of web pages within said first cluster of web pages and creating an intersection data structure comprising the structural location of said DFVs of database records (DBRs), and inferring data field names (DFNs) associated with keywords, symbols, or patterns, indicating the location of said DFVs in said intersection data structure; automatically deriving a first extraction template from said intersection data structure associated with said first sample of web pages within said first cluster of web pages; utilizing said first extraction template for extracting said DFVs from said first cluster of web pages to create, in association with said inferred DFNs, extracted DBRs; and storing said extracted DBRs in said second database. - View Dependent Claims (2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17)
-
Specification