System and method for enabling website owners to manage crawl rate in a website indexing system
First Claim
Patent Images
1. A computer-implemented method of indexing documents in websites, the method comprising:
- on a server system having one or more processors and memory storing programs to be executed by the one or more processors;
for each website of a multiplicity of websites, each website having a corresponding current crawl rate limit;
crawling the respective website, in accordance with the current crawl rate limit corresponding to the respective website, to download documents from the respective website for inclusion in a database;
storing crawl data associated with the crawling of the respective website;
providing, for display, a crawl rate control mechanism to a respective owner of the respective website, including providing for display to the respective owner at least a portion of the crawl data, and enabling selection of a new crawl rate limit corresponding to the respective website by the respective owner;
comparing a maximum crawl rate for the respective website over a defined period of time with the current crawl rate limit for crawling the respective website to determine if the current crawl rate limit is a limiting factor in crawling the respective website; and
in response to a request to increase a current crawl rate for crawling the respective website, increasing the current crawl rate limit only when the current crawl rate limit is a limiting factor in crawling the respective website.
2 Assignments
0 Petitions
Accused Products
Abstract
Web crawlers crawl websites to access documents of the website for purposes of indexing the documents for search engines. The web crawlers crawl a specified website at a crawl rate that is based on multiple factors. One of the factors is a pre-set crawl rate limit. According to certain embodiments, an owner for a specified website is enabled to modify the crawl rate limit for the specified website when one or more pre-set criteria are met.
67 Citations
36 Claims
-
1. A computer-implemented method of indexing documents in websites, the method comprising:
-
on a server system having one or more processors and memory storing programs to be executed by the one or more processors; for each website of a multiplicity of websites, each website having a corresponding current crawl rate limit; crawling the respective website, in accordance with the current crawl rate limit corresponding to the respective website, to download documents from the respective website for inclusion in a database; storing crawl data associated with the crawling of the respective website; providing, for display, a crawl rate control mechanism to a respective owner of the respective website, including providing for display to the respective owner at least a portion of the crawl data, and enabling selection of a new crawl rate limit corresponding to the respective website by the respective owner; comparing a maximum crawl rate for the respective website over a defined period of time with the current crawl rate limit for crawling the respective website to determine if the current crawl rate limit is a limiting factor in crawling the respective website; and in response to a request to increase a current crawl rate for crawling the respective website, increasing the current crawl rate limit only when the current crawl rate limit is a limiting factor in crawling the respective website. - View Dependent Claims (2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13)
-
-
14. A computer system comprising:
-
memory; one or more processors; and at least one program stored in the memory and executed by the one or more processors, the at least one program including; web crawl control instructions for controlling crawling of each website of a multiplicity of websites, each website having a corresponding current crawl rate limit, the web crawl control instructions including; instructions for crawling a respective website of the multiplicity of websites, in accordance with the current crawl rate limit corresponding to the respective website, to download documents from the respective website for inclusion in a database; instructions for storing crawl data associated with the crawling of the respective website; instructions for providing, for display, a crawl rate control mechanism to a respective owner of the respective website, including providing, for display to the respective owner, at least a portion of the crawl data, and enabling selection, by the respective owner, of a new crawl rate limit corresponding to the respective website; instructions for comparing a maximum crawl rate for the respective website over a defined period of time with the current crawl rate limit for crawling the respective website to determine if the current crawl rate limit is a limiting factor in crawling the respective website; and instructions for responding to a request to increase the current crawl rate for crawling the respective website by increasing the current crawl rate limit only when the current crawl rate limit is a limiting factor in crawling the respective website. - View Dependent Claims (15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25)
-
-
26. A a computer readable storage medium storing one or more programs for execution by one or more processors of a computer system, the one or more programs comprising:
web crawl control instructions for controlling crawling of each website of a multiplicity of websites, each website having a corresponding current crawl rate limit, the web crawl control instructions including; instructions for crawling a respective website of the multiplicity of websites, in accordance with the current crawl rate limit corresponding to the respective website, to download documents from the respective website for inclusion in a database; instructions for storing crawl data associated with the crawling of the respective website; instructions for providing, for display, a crawl rate control mechanism to a respective owner of the respective website, including providing, for display to the respective owner, at least a portion of the crawl data, and enabling selection, by the respective owner, of a new crawl rate limit corresponding to the respective website; instructions for comparing a maximum crawl rate for the respective website over a defined period of time with the current crawl rate limit for crawling the respective website to determine if the current crawl rate limit is a limiting factor in crawling the respective website; and instructions for responding to a request to increase the current crawl rate for crawling the respective website by increasing the current crawl rate limit only when the current crawl rate limit is a limiting factor in crawling the respective website. - View Dependent Claims (27, 28, 29, 30, 31, 32, 33, 34, 35, 36)
Specification