×

Scheduler for search engine crawler

  • US 7,725,452 B1
  • Filed: 05/20/2004
  • Issued: 05/25/2010
  • Est. Priority Date: 07/03/2003
  • Status: Active Grant
First Claim
Patent Images

1. A method of scheduling document indexing, comprising:

  • at a search engine crawler system having one or more processors and memory storing programs for execution by the one or more processors;

    retrieving a number of document identifiers, each document identifier identifying a corresponding document on a network; and

    for each retrieved document identifier and its corresponding document,determining a query-independent score indicative of a page rank of the corresponding document relative to other documents in a set of documents;

    determining a content change frequency of the corresponding document by comparing information stored for successive downloads of the corresponding document;

    determining an age of the corresponding document, wherein the age is associated with the time of the last download of the corresponding document by the crawler system;

    determining a first score for the document identifier that is a function of the determined query-independent score and the determined content change frequency and the determined age of the corresponding document;

    comparing the first score against a threshold value; and

    conditionally scheduling the document for indexing based on the result of the comparison.

View all claims
  • 2 Assignments
Timeline View
Assignment View
    ×
    ×