System and Method for Enabling Website Owners to Manage Crawl Rate in a Website Indexing System
First Claim
Patent Images
1. A computer-implemented method of indexing documents in websites, the method comprising:
- on a server system having one or more processors and memory storing programs to be executed by the one or more processors;
for each website of a multiplicity of websites, each website having a corresponding current crawl rate limit;
crawling the respective website, in accordance with the current crawl rate limit corresponding to the respective website, to download documents from the respective website for inclusion in a database;
storing crawl data associated with the crawling of the respective website; and
providing, for display, a crawl rate control mechanism to a respective owner of the respective website, including providing for display to the respective owner at least a portion of the crawl data, and enabling selection of a new crawl rate limit corresponding to the respective website by the respective owner.
1 Assignment
0 Petitions
Accused Products
Abstract
Web crawlers crawl websites to access documents of the website for purposes of indexing the documents for search engines. The web crawlers crawl a specified website at a crawl rate that is based on multiple factors. One of the factors is a pre-set crawl rate limit. According to certain embodiments, an owner for a specified website is enabled to modify the crawl rate limit for the specified website when one or more pre-set criteria are met.
26 Citations
36 Claims
-
1. A computer-implemented method of indexing documents in websites, the method comprising:
-
on a server system having one or more processors and memory storing programs to be executed by the one or more processors; for each website of a multiplicity of websites, each website having a corresponding current crawl rate limit; crawling the respective website, in accordance with the current crawl rate limit corresponding to the respective website, to download documents from the respective website for inclusion in a database; storing crawl data associated with the crawling of the respective website; and providing, for display, a crawl rate control mechanism to a respective owner of the respective website, including providing for display to the respective owner at least a portion of the crawl data, and enabling selection of a new crawl rate limit corresponding to the respective website by the respective owner. - View Dependent Claims (2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12)
-
-
13. A computer system comprising:
-
memory; one or more processors; and at least one program stored in the memory and executed by the one or more processors, the at least one program including; web crawl control instructions for controlling crawling of each website of a multiplicity of websites, each website having a corresponding current crawl rate limit, the web crawl control instructions including; instructions for crawling a respective website of the multiplicity of websites, in accordance with the current crawl rate limit corresponding to the respective website, to download documents from the respective website for inclusion in a database; instructions for storing crawl data associated with the crawling of the respective website; and instructions for providing, for display, a crawl rate control mechanism to a respective owner of the respective website, including providing, for display to the respective owner, at least a portion of the crawl data, and enabling selection, by the respective owner, of a new crawl rate limit corresponding to the respective website. - View Dependent Claims (14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24)
-
-
25. A computer program product for use in conjunction with a computer system, the computer program product comprising a computer readable storage medium and a computer program mechanism embedded therein, the computer program mechanism comprising:
web crawl control instructions for controlling crawling of each website of a multiplicity of websites, each website having a corresponding current crawl rate limit, the web crawl control instructions including; instructions for crawling a respective website of the multiplicity of websites, in accordance with the current crawl rate limit corresponding to the respective website, to download documents from the respective website for inclusion in a database; instructions for storing crawl data associated with the crawling of the respective website; and instructions for providing, for display, a crawl rate control mechanism to a respective owner of the respective website, including providing, for display to the respective owner, at least a portion of the crawl data, and enabling selection, by the respective owner, of a new crawl rate limit corresponding to the respective website. - View Dependent Claims (26, 27, 28, 29, 30, 31, 32, 33, 34, 35, 36)
Specification