System and method for enabling website owner to manage crawl rate in a website indexing system
First Claim
Patent Images
1. A method of indexing documents in websites, the method comprising:
- on a server system having one or more processors and memory storing programs to be executed by the one or more processors;
storing, for each website of a multiplicity of websites, a corresponding current crawl rate limit;
comparing a maximum crawl rate for a respective website over a defined period of time with the current crawl rate limit for crawling the respective website to determine if the current crawl rate limit is a limiting factor in crawling the respective website;
performing a website crawling management function in accordance with the determination of whether the current crawl rate limit is the limiting factor in crawling the respective website; and
providing a crawl rate control mechanism to a respective owner of the respective website, wherein the crawl rate control mechanism enables selection of a new crawl rate limit corresponding to the respective website by the respective owner.
1 Assignment
0 Petitions
Accused Products
Abstract
Web crawlers crawl websites to access documents of the website for purposes of indexing the documents for search engines. The web crawlers crawl a specified website at a crawl rate that is based on multiple factors. One of the factors is a pre-set crawl rate limit. According to certain embodiments, an owner for a specified website is enabled to modify the crawl rate limit for the specified website when one or more pre-set criteria are met.
51 Citations
39 Claims
-
1. A method of indexing documents in websites, the method comprising:
on a server system having one or more processors and memory storing programs to be executed by the one or more processors; storing, for each website of a multiplicity of websites, a corresponding current crawl rate limit; comparing a maximum crawl rate for a respective website over a defined period of time with the current crawl rate limit for crawling the respective website to determine if the current crawl rate limit is a limiting factor in crawling the respective website; performing a website crawling management function in accordance with the determination of whether the current crawl rate limit is the limiting factor in crawling the respective website; and providing a crawl rate control mechanism to a respective owner of the respective website, wherein the crawl rate control mechanism enables selection of a new crawl rate limit corresponding to the respective website by the respective owner. - View Dependent Claims (2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 34, 35)
-
12. A computer system comprising:
-
memory; one or more processors; and at least one program stored in the memory and executed by the one or more processors, the at least one program including instructions for; storing, for each website of a multiplicity of websites, a corresponding current crawl rate limit; comparing a maximum crawl rate for a respective website over a defined period of time with the current crawl rate limit for crawling the respective website to determine if the current crawl rate limit is a limiting factor in crawling the respective website; performing a website crawling management function in accordance with the determination of whether the current crawl rate limit is the limiting factor in crawling the respective website; and providing a crawl rate control mechanism to a respective owner of the respective website, wherein the crawl rate control mechanism enables selection of a new crawl rate limit corresponding to the respective website by the respective owner. - View Dependent Claims (13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 36, 37)
-
-
23. A non-transitory computer readable storage medium storing at least one program for execution by a computer system, the at least one program comprising instructions for:
-
storing, for each website of a multiplicity of websites, a corresponding current crawl rate limit; comparing a maximum crawl rate for the respective website over a defined period of time with the current crawl rate limit for crawling the respective website to determine if the current crawl rate limit is a limiting factor in crawling the respective website; and performing a website crawling management function in accordance with the determination of whether the current crawl rate limit is the limiting factor in crawling the respective website; and providing a crawl rate control mechanism to a respective owner of the respective website, wherein the crawl rate control mechanism enables selection of a new crawl rate limit corresponding to the respective website by the respective owner. - View Dependent Claims (24, 25, 26, 27, 28, 29, 30, 31, 32, 33, 38, 39)
-
Specification