×

System and method for enforcing politeness while scheduling downloads in a web crawler

  • US 6,321,265 B1
  • Filed: 11/02/1999
  • Issued: 11/20/2001
  • Est. Priority Date: 11/02/1999
  • Status: Expired due to Term
First Claim
Patent Images

1. A method of downloading data sets from among a plurality of host computers, comprising:

  • (a) obtaining at least one referring data set that includes addresses of one or more referred data sets;

    each referred data set address including a host address;

    (b) enqueuing the referred data set addresses in a plurality of queues, including enqueuing those of the referred data set addresses sharing a respective common host address into a respective common one of the queues;

    (c) assigning a next download time to each of the queues that has enqueued therein at least one referred data set address;

    (d) substantially concurrently operating a plurality of threads, wherein the number of queues is at least as great as the number of threads;

    (e) while operating each thread, repeatedly performing steps of;

    (e1) selecting one of the queues not selected by any of the other threads, in accordance with the next download times assigned to the queues not selected by any of the other threads;

    (e2) downloading a referred data set corresponding to a referred data set address in the selected queue, processing the downloaded referred data set, dequeuing the referred data set address from the selected queue;

    (e3) when the selected queue is not empty after the dequeuing step, assigning an updated next download time to the selected queue; and

    (e4) deselecting the selected queue;

    wherein the enqueuing of referred data set addresses sharing a respective common host address to a respective common one of the queues in step (b) ensures that the downloading in step (e2) by the plurality of threads does not simultaneously download more than one referred data set from any of the host computers.

View all claims
  • 7 Assignments
Timeline View
Assignment View
    ×
    ×