×

Scheduler for search engine crawler

  • US 8,707,313 B1
  • Filed: 02/18/2011
  • Issued: 04/22/2014
  • Est. Priority Date: 07/03/2003
  • Status: Expired due to Fees
First Claim
Patent Images

1. A scheduler system for a search engine crawler, comprising:

  • a memory for storing a set of document identifiers corresponding to documents on a network and associated status data collected during one or more previous crawls by the search engine crawler; and

    a plurality of schedulers configured to select a subset of the document identifiers for crawling, at least some of the schedulers configured to compute priority scores for the subset of document identifiers and to schedule for crawling at least a portion of the subset of document identifiers based on the priority scores and status data;

    wherein the scheduling includes removing from the subset one or more document identifiers that were unreachable in a plurality of consecutive prior crawls or had download errors in a plurality of consecutive prior crawls; and

    wherein each priority score is computed as a product of a respective page importance and a respective boost factor, the respective boost factor being a scalar that is used to demote or promote the priority score of the respective document identifier.

View all claims
  • 1 Assignment
Timeline View
Assignment View
    ×
    ×