×

Web Crawler Scheduler that Utilizes Sitemaps from Websites

  • US 20130226898A1
  • Filed: 04/08/2013
  • Published: 08/29/2013
  • Est. Priority Date: 05/31/2005
  • Status: Active Grant
First Claim
Patent Images

1. A method of scheduling documents for crawling, performed on a computer system having one or more processors and memory storing one or more programs for execution by the one or more processors, the method comprising:

  • obtaining sitemap information for a plurality of websites;

    analyzing the sitemap information to identify a website, in the plurality of websites, having sitemap information that is at least potentially out of date;

    updating the sitemap information for the identified website by downloading updated sitemap information for the identified website; and

    scheduling documents for crawling in accordance with the updated sitemap information for the identified website.

View all claims
  • 1 Assignment
Timeline View
Assignment View
    ×
    ×