×

Web Crawler Scheduler that Utilizes Sitemaps from Websites

  • US 20100262592A1
  • Filed: 06/25/2010
  • Published: 10/14/2010
  • Est. Priority Date: 05/31/2005
  • Status: Active Grant
First Claim
Patent Images

1. A method of scheduling documents for crawling, performed on a computer system having one or more processors and memory storing one or more programs for execution by the one or more processors, the method comprising:

  • receiving from a website a notification that includes a sitemap URL corresponding to a sitemap for the website;

    in response to the notification;

    accessing the sitemap at the sitemap URL; and

    retrieving from the sitemap document location information and metadata for a plurality of documents associated with the website;

    scheduling for downloading documents, from among the plurality of documents, based at least in part on the metadata retrieved from the sitemap; and

    downloading at least a subset of the documents scheduled for downloading.

View all claims
  • 1 Assignment
Timeline View
Assignment View
    ×
    ×