Search engine and method with improved relevancy, scope, and timeliness
First Claim
1. A method for providing a training set to build a statistical relevancy scoring function for a document relative to selected terms in a lexicon, comprising:
- in a search engine that accesses servers of documents in a computer network,(a) identifying an initial set of hypertext documents in a collection of documents as a training set of relevant documents;
(b) identifying hyperlinks included in each hypertext document of the training set;
(c) including in the training set the hypertext documents pointed to by the identified hyperlinks;
(d) identifying anchortexts associated with the hypertext documents of the training set; and
(e) including the anchortexts in the lexicon;
wherein the statistical scoring function is determined by combining individual contributions to the statistical scoring function by each of the selected terms, wherein the individual contribution by each selected term is related to a term frequency, being the frequency of occurrence of that selected term in the document, and a document frequency, being the number of documents in the collection of documents that include that selected term.
0 Assignments
0 Petitions
Accused Products
Abstract
A search engine and a method achieve timeliness of documents returned in a search result by a relevancy feedback mechanism driven by the frequency in which a URL is returned in recent searches. The relevancy feedback mechanism includes one or more random processes which determine whether or not a cached or indexed web page associated with a URL in the search result should be refreshed. In addition, the random processes also determine whether or not hyperlinks in the cached or indexed web page should be followed to access related web pages. Accesses of web pages resulting from the operations of the random processes are used to update any document index maintained by the search engine. Relevancy scoring functions implemented in look-up tables are also disclosed. A more accurate relevancy scoring function is achieved using a lexicon based on anchortexts of extracted hyperlinks of web documents.
-
Citations
19 Claims
-
1. A method for providing a training set to build a statistical relevancy scoring function for a document relative to selected terms in a lexicon, comprising:
-
in a search engine that accesses servers of documents in a computer network, (a) identifying an initial set of hypertext documents in a collection of documents as a training set of relevant documents; (b) identifying hyperlinks included in each hypertext document of the training set; (c) including in the training set the hypertext documents pointed to by the identified hyperlinks; (d) identifying anchortexts associated with the hypertext documents of the training set; and (e) including the anchortexts in the lexicon; wherein the statistical scoring function is determined by combining individual contributions to the statistical scoring function by each of the selected terms, wherein the individual contribution by each selected term is related to a term frequency, being the frequency of occurrence of that selected term in the document, and a document frequency, being the number of documents in the collection of documents that include that selected term. - View Dependent Claims (2, 3, 4, 5, 6, 7, 8, 9, 10, 11)
-
-
12. A method for evaluating a relevancy scoring function for scoring documents in a search engine that accesses servers of documents in a computer network, comprising:
-
compiling a lexicon; for each term in the lexicon; identifying, from a corpus of documents, documents in which the term appears; computing a document frequency that relates linearly to a ratio of the number of the identified documents to the number of documents in the corpus; computing a term frequency for each identified document based on the number of times the term appears in the document; and deriving an individual contribution by the term to the relevancy scoring function using the computed term frequency and the computed document frequency; receiving a search query including one or more terms present in the lexicon; recovering a collection of documents based on the terms in the search query; and evaluating the relevancy scoring function for each recovered document by combining the derived individual contributions by the terms in the search query to the relevancy scoring function. - View Dependent Claims (13, 14, 15, 16, 17, 18, 19)
-
Specification