Method for estimating coverage of Web search engines
First Claim
1. A computerized method for estimating coverage of search engines, each search engine maintaining an index of words of pages located at specific address in a network, comprising the steps of:
- generating a random query, the random query being a logical combination of words found in a training set of the pages;
submitting the random query to a first search engine;
receiving a set of URLs in response to the random query;
randomly selecting a particular URL identifying a sample page;
generating a strong query for the sample page;
submitting the strong query to a second search engine; and
comparing result information received in response to the strong query to determine if the second search engine has indexed the sample page.
9 Assignments
0 Petitions
Accused Products
Abstract
A computerized method is used to estimate the relative coverage of Web search engines. Each search engine maintains an index of words of pages located at specific URL addresses in a network. The method generates a random query. The random query is a logical combination of words found in a subset of the pages. The random query is submitted to a first search engine. In response a set of URLs of pages matching the query are received. Each URL identifies a page indexed by the first search engine that satisfies the random query. A particular URL identifying a sample page is randomly selected. A strong query corresponding to the sample page is generated, and the strong query is submitted to a second search engine. Result information received in response to the strong query is compared to determine if the second search engine has indexed the sample page, or a page substantially similar to the sample page. This procedure is repeated to gather statistical data which is used to estimate the relative sizes and amount of overlap of search engines.
72 Citations
1 Claim
-
1. A computerized method for estimating coverage of search engines, each search engine maintaining an index of words of pages located at specific address in a network, comprising the steps of:
-
generating a random query, the random query being a logical combination of words found in a training set of the pages;
submitting the random query to a first search engine;
receiving a set of URLs in response to the random query;
randomly selecting a particular URL identifying a sample page;
generating a strong query for the sample page;
submitting the strong query to a second search engine; and
comparing result information received in response to the strong query to determine if the second search engine has indexed the sample page.
-
Specification