×

Method for downloading high-volumes of content from the internet without adversely effecting the source of the content or being detected

  • US 7,516,194 B1
  • Filed: 05/21/2003
  • Issued: 04/07/2009
  • Est. Priority Date: 05/21/2002
  • Status: Expired due to Fees
First Claim
Patent Images

1. A system for downloading a plurality of documents from a plurality of content servers, said content servers being linked to a plurality of routers that each have a different network address, said system comprising:

  • a plurality of pullers;

    a director for;

    creating a list of URLs of the plurality of documents to be downloaded from the plurality of content servers, each of the plurality of said documents being identified by a different URL; and

    assigning a portion of the list of URLs to each of the pullers such that each portion assigned to a particular puller includes all documents to be retrieved from a single content server wherein no two pullers initiate requests to adjacent URLs, wherein adjacent URLs identify documents located on the same content server;

    wherein each of the plurality of pullers is responsive to the director for;

    receiving the assigned portion of the list of URLs;

    queuing requests to retrieve documents identified by the received portion of the list of URLs wherein the requests having different URLs are queued by the puller;

    determining if the URL of a first queued request is adjacent to the URL of a document being currently downloaded;

    if the URL of the first queued request is adjacent to the URL of a document being currently downloaded, waiting until the currently downloading document has been received before initiating the first queued request to avoid overlapping requests to the content server;

    if the URL of the queued request is not adjacent to the URL of a document being currently downloaded, initiating the first queued request; and

    a proxy gateway responsive to each of the pullers for receiving the initiated requests to retrieve documents, and for retrieving documents corresponding to the list of URL from the content servers via the routers.

View all claims
  • 2 Assignments
Timeline View
Assignment View
    ×
    ×