×

Web crawler scheduler that utilizes sitemaps from websites

  • US 9,355,177 B2
  • Filed: 01/27/2015
  • Issued: 05/31/2016
  • Est. Priority Date: 05/31/2005
  • Status: Active Grant
First Claim
Patent Images

1. A method of scheduling documents for crawling, performed on a computer system having one or more processors and memory storing one or more programs for execution by the one or more processors, the method comprising:

  • identifying an updated sitemap using a last change date in a sitemap index, the sitemap index including a list of sitemaps for a website, each sitemap having a URL and a last change date;

    updating sitemap information for the sitemap by downloading updated sitemap information, whereinthe sitemap information includes a list of URLs corresponding to documents stored at the website and each URL is associated with two or more of;

    a last modification date for the URL, a change frequency of a document specified by the URL, and a priority of the document; and

    scheduling documents for crawling in accordance with the updated sitemap information.

View all claims
  • 2 Assignments
Timeline View
Assignment View
    ×
    ×