Scheduler for search engine crawler
First Claim
Patent Images
1. A method of scheduling document indexing, comprising:
- at a computing system having one or more processors and memory storing programs for execution by the one or more processors;
retrieving a number of document identifiers, each document identifier identifying a corresponding document on a network; and
for each retrieved document identifier and its corresponding document,determining a query-independent score indicative of a rank of the corresponding document relative to other documents in a set of documents;
determining a first score for the document identifier that is a function of the determined query-independent score, a determined content change frequency of the corresponding document, and an age of the corresponding document;
comparing the first score against a threshold value thereby obtaining a result, wherein the threshold value is a function of a speed of the engine crawler system; and
conditionally scheduling the corresponding document for indexing based on the result.
1 Assignment
0 Petitions
Accused Products
Abstract
A scheduler for a search engine crawler includes a history log containing document identifiers (e.g., URLs) corresponding to documents (e.g., web pages) on a network (e.g., Internet). The scheduler is configured to process each document identifier in a set of the document identifiers by determining a content change frequency of the document corresponding to the document identifier, determining a first score for the document identifier that is a function of the determined content change frequency of the corresponding document, comparing the first score against a threshold value, and scheduling the corresponding document for indexing based on the results of the comparison.
108 Citations
42 Claims
-
1. A method of scheduling document indexing, comprising:
-
at a computing system having one or more processors and memory storing programs for execution by the one or more processors; retrieving a number of document identifiers, each document identifier identifying a corresponding document on a network; and for each retrieved document identifier and its corresponding document, determining a query-independent score indicative of a rank of the corresponding document relative to other documents in a set of documents; determining a first score for the document identifier that is a function of the determined query-independent score, a determined content change frequency of the corresponding document, and an age of the corresponding document; comparing the first score against a threshold value thereby obtaining a result, wherein the threshold value is a function of a speed of the engine crawler system; and conditionally scheduling the corresponding document for indexing based on the result. - View Dependent Claims (2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20)
-
-
21. A computing system, comprising:
-
one or more processors; memory; and one or more programs, wherein the one or more programs are stored in the memory and configured to be executed by the one or more processors, the one or more programs including instructions for; retrieving a number of document identifiers, each document identifier identifying a corresponding document on a network; and for each retrieved document identifier and its corresponding document, determining a query-independent score indicative of a rank of the corresponding document relative to other documents in a set of documents; determining a first score for the document identifier that is a function of the determined query-independent score, a determined content change frequency of the corresponding document, and an age of the corresponding document; comparing the first score against a threshold value thereby obtaining a result, wherein the threshold value is a function of a speed of the engine crawler system; and conditionally scheduling the corresponding document for indexing based on the result. - View Dependent Claims (22, 23, 24, 25, 26, 27, 28, 29, 30, 31, 32, 33, 34, 35, 36, 37, 38, 39, 40)
-
-
41. A non-transitory computer readable storage medium storing one or more programs, the one or more programs comprising instructions, which when executed by a computer system with one or more processors, cause the computer system to:
-
retrieve a number of document identifiers, each document identifier identifying a corresponding document on a network; and for each retrieved document identifier and its corresponding document, determine a query-independent score indicative of a rank of the corresponding document relative to other documents in a set of documents; determine a first score for the document identifier that is a function of the determined query-independent score, a determined content change frequency of the corresponding document, and an age of the corresponding document; compare the first score against a threshold value thereby obtaining a result, wherein the threshold value is a function of a speed of the engine crawler system; and conditionally schedule the corresponding document for indexing based on the result. - View Dependent Claims (42)
-
Specification