Search engine and method with improved relevancy, scope, and timeliness
First Claim
1. A computer implemented method for adaptive feedback ensuring timeliness of a collection of web pages, the method comprising:
- extracting a uniform resource locator (URL) from a search result; and
determining whether or not a web page corresponding to the URL is present in whole or in part in the collection, wherein;
when the web page is determined to be present in the collection, downloading and replacing the web page in the collection in accordance with a first probability, wherein the first probability depends on an age of the web page in the collection; and
when the web page is determined not to be present in the collection, downloading and including the web page in the collection, further includes extracting hyperlinks from the web page corresponding to the URL and downloading the web pages corresponding to the hyperlinks each with a second probability, wherein the second probability depends on the number of hyperlinks in the web page.
1 Assignment
0 Petitions
Accused Products
Abstract
A search engine and a method achieve timeliness of documents returned in a search result by a relevancy feedback mechanism driven by the frequency in which a URL is returned in recent searches. The relevancy feedback mechanism includes one or more random processes which determine whether or not a cached or indexed web page associated with a URL in the search result should be refreshed. In addition, the random processes also determine whether or not hyperlinks in the cached or indexed web page should be followed to access related web pages. Accesses of web pages resulting from the operations of the random processes are used to update any document index maintained by the search engine. Relevancy scoring functions implemented in look-up tables are also disclosed. A more accurate relevancy scoring function is achieved using a lexicon based on anchortexts of extracted hyperlinks of web documents.
-
Citations
16 Claims
-
1. A computer implemented method for adaptive feedback ensuring timeliness of a collection of web pages, the method comprising:
-
extracting a uniform resource locator (URL) from a search result; and determining whether or not a web page corresponding to the URL is present in whole or in part in the collection, wherein; when the web page is determined to be present in the collection, downloading and replacing the web page in the collection in accordance with a first probability, wherein the first probability depends on an age of the web page in the collection; and when the web page is determined not to be present in the collection, downloading and including the web page in the collection, further includes extracting hyperlinks from the web page corresponding to the URL and downloading the web pages corresponding to the hyperlinks each with a second probability, wherein the second probability depends on the number of hyperlinks in the web page. - View Dependent Claims (2, 3, 4, 5, 6, 7, 8)
-
-
9. An adaptive feedback system for ensuring timeliness of a collection of web pages, the system comprising:
-
a query processor that extracts a uniform resource locator (URL) from a search result; a cache including a process for determining whether or not a web page corresponding to the URL is present in whole or in part in the collection; a web crawler coupled to the processor wherein; when the web page is determined to be present in the collection, the web crawler downloads and replaces the web page in the collection in accordance with a first probability, wherein the first probability depends on an age of the web page in the collection; and when the web page is determined not to be present in the collection, the web crawler downloads and including the web page in the collection, wherein the processor extracts hyperlinks from the web page corresponding to the URL and directs the web crawler to download the web pages corresponding to the hyperlinks each with a second probability, wherein the second probability depends on the number of hyperlinks in the web page. - View Dependent Claims (10, 11, 12, 13, 14, 15, 16)
-
Specification