×

Method and apparatus for managing a backlog of pending URL crawls

  • US 8,676,783 B1
  • Filed: 06/28/2011
  • Issued: 03/18/2014
  • Est. Priority Date: 06/28/2011
  • Status: Active Grant
First Claim
Patent Images

1. A method of reducing a URL crawl backlog in view of a limited URL crawl capacity, for use with a URL crawler executed by a computing device, comprising:

  • receiving a set of pending URL crawl requests at the URL crawler, each URL crawl request arriving with an assigned priority;

    placing, into a backlog data structure, a first sub-set of the set of pending URL crawl requests, the backlog data structure having an associated maximum wait time;

    rejecting from the backlog data structure a second sub-set of the set of pending URL crawl requests having priorities failing a priority threshold, such that said rejecting happens without the pending URL crawl requests in the second sub-set being performed, and such that said rejecting happens without the pending URL crawl requests in the second sub-set waiting in the backlog data structure until the maximum wait time; and

    adjusting the priority threshold based on an estimate of a probability that newly requested URL crawl requests will be satisfied.

View all claims
  • 3 Assignments
Timeline View
Assignment View
    ×
    ×