SEARCH ENGINE AND METHOD WITH IMPROVED RELEVANCY, SCOPE, AND TIMELINESS
First Claim
1. A method for providing a training set to build a statistical relevancy scoring function to be used in a search engine, comprising:
- (a) identifying an initial set of hypertext documents in a collection of documents as a training set of relevant documents;
(b) identifying hyperlinks included in each hypertext document of the training set; and
(c) including the hypertext documents pointed to by the identified hyperlinks.
0 Assignments
0 Petitions
Accused Products
Abstract
A search engine and a method achieve timeliness of documents returned in a search result by a relevancy feedback mechanism driven by the frequency in which a URL is returned in recent searches. The relevancy feedback mechanism includes one or more random processes which determine whether or not a cached or indexed web page associated with a URL in the search result should be refreshed. In addition, the random processes also determine whether or not hyperlinks in the cached or indexed web page should be followed to access related web pages. Accesses of web pages resulting from the operations of the random processes are used to update any document index maintained by the search engine. Relevancy scoring functions implemented in look-up tables are also disclosed. A more accurate relevancy scoring function is achieved using a lexicon based on anchortexts of extracted hyperlinks of web documents.
-
Citations
30 Claims
-
1. A method for providing a training set to build a statistical relevancy scoring function to be used in a search engine, comprising:
-
(a) identifying an initial set of hypertext documents in a collection of documents as a training set of relevant documents; (b) identifying hyperlinks included in each hypertext document of the training set; and (c) including the hypertext documents pointed to by the identified hyperlinks. - View Dependent Claims (2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15)
-
-
16. A method for providing a relevancy scoring function for scoring documents in a search result, comprising:
-
compiling a lexicon including tei s used in search queries that are input to a search engine; and for each term in the lexicon; identifying from a corpus of documents those documents in which the term appears; computing a document frequency based on the relative numbers of the identified documents and the documents in the corpus; and computing a term frequency for each identified document based on the number of times the term appears in the document. - View Dependent Claims (17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28, 29, 30)
-
Specification