×

Web crawler scheduler that utilizes sitemaps from websites

  • US 8,417,686 B2
  • Filed: 10/11/2011
  • Issued: 04/09/2013
  • Est. Priority Date: 05/31/2005
  • Status: Active Grant
First Claim
Patent Images

1. A method of scheduling documents for crawling, performed on a computer system having one or more processors and memory storing one or more programs for execution by the one or more processors, the method comprising:

  • storing sitemap information for a plurality of websites, wherein the information includes a predicted update period for at least a plurality of documents identified by the sitemap information;

    analyzing the stored sitemap information to identify a respective website having sitemap information that is at least potentially out of date;

    updating the stored sitemap information for the identified respective website by downloading updated sitemap information for the identified respective website; and

    scheduling documents for crawling in accordance with the updated stored sitemap information for the identified respective website.

View all claims
  • 1 Assignment
Timeline View
Assignment View
    ×
    ×