Limiting requests by web crawlers to a web host
First Claim
1. A computer-implemented method of limiting requests to a web host by multiple competing web crawlers of a search engine, comprising on a server system:
- receiving from a plurality of web crawlers a stream of capacity requests for a plurality of web hosts, each web host having a specified maximum allowed load level comprising a predefined number of web crawler download requests per unit time;
for each pair of requesting web crawler and requested web host, creating a lease between the web host and the web crawler, the lease including an identity of the web crawler, an identity of the web host, a load capacity allocated to the web crawler and a lease update time prior to a lease expire time at which the lease expires unless the lease is extended;
wherein the load capacity allocated to the web crawler comprises a specified number of download requests per unit time; and
upon arrival of a respective lease'"'"'s lease update time and satisfaction of a predefined condition, automatically updating the respective lease between a respective web crawler of the plurality of web crawlers and a respective web host of the plurality of web hosts by granting the web crawler an updated share of the web host'"'"'s maximum allowed load level, the updated lease having an updated lease expire time later than the lease update time;
wherein creating the lease includes limiting the load capacity allocated to the requesting web crawler such that a sum of the load capacity allocated to each of the web crawlers having a lease with the requested web host is no greater than the requested web host'"'"'s maximum allowed load level.
2 Assignments
0 Petitions
Accused Products
Abstract
A host load server balances a web host'"'"'s load capacity among multiple competing web crawlers of a search engine. The host load server establishes a lease for each pair of requesting web crawler and requested web host. The lease expires at a scheduled time. If the web crawler completes its mission of retrieving documents from the web host prior to the expiration of the lease, the host load server releases the load capacity allocated to the web crawler and makes it available for other competing web crawlers. If the web crawler submits a request for renewing its lease with the web host at the scheduled time, the host load server allocates another share of load capacity to the web crawler. If the web crawler does not submit any request at the scheduled time, the host load server terminates the lease and releases the load capacity for other web crawlers.
54 Citations
27 Claims
-
1. A computer-implemented method of limiting requests to a web host by multiple competing web crawlers of a search engine, comprising on a server system:
-
receiving from a plurality of web crawlers a stream of capacity requests for a plurality of web hosts, each web host having a specified maximum allowed load level comprising a predefined number of web crawler download requests per unit time; for each pair of requesting web crawler and requested web host, creating a lease between the web host and the web crawler, the lease including an identity of the web crawler, an identity of the web host, a load capacity allocated to the web crawler and a lease update time prior to a lease expire time at which the lease expires unless the lease is extended;
wherein the load capacity allocated to the web crawler comprises a specified number of download requests per unit time; andupon arrival of a respective lease'"'"'s lease update time and satisfaction of a predefined condition, automatically updating the respective lease between a respective web crawler of the plurality of web crawlers and a respective web host of the plurality of web hosts by granting the web crawler an updated share of the web host'"'"'s maximum allowed load level, the updated lease having an updated lease expire time later than the lease update time; wherein creating the lease includes limiting the load capacity allocated to the requesting web crawler such that a sum of the load capacity allocated to each of the web crawlers having a lease with the requested web host is no greater than the requested web host'"'"'s maximum allowed load level. - View Dependent Claims (2, 3, 4, 5, 6, 7, 8, 9)
-
-
10. A system for limiting requests to a web host by multiple competing web crawlers of a search engine, comprising:
-
one or more central processing units for executing programs; and a host load allocation module executable by the one or more central processing units, the module comprising; instructions for creating a lease between the web host and a requesting web crawler, the lease including an identity of the requesting web crawler, an identity of the web host, a load capacity allocated to the requesting web crawler and a lease update time prior to a lease expire time at which the lease expires unless the lease is extended;
wherein the web host has a specified maximum allowed load level comprising a predefined number of web crawler download requests per unit time, and the load capacity allocated to the requesting web crawler comprises a specified number of download requests per unit time;
the instructions for creating a lease including;instructions for limiting the load capacity allocated to the requesting web crawler such that a sum of load capacity allocated to each of the web crawlers having a lease with the web host is no greater than the web host'"'"'s maximum allowed load level; instructions for upon arrival of the lease'"'"'s lease update time and satisfaction of a predefined condition, automatically updating the lease between the web host and the web crawler at the lease'"'"'s lease update time by granting the web crawler an updated share of the web host'"'"'s maximum allowed load level, the updated lease having an updated lease expire time later than the lease update time; instructions for terminating the lease between the web host and the web crawler at the lease'"'"'s lease update time if the predefined condition is not satisfied or per the web crawler'"'"'s request; and a plurality of data structures storing information of a plurality of web hosts, a plurality of web crawlers and a plurality of leases between the web hosts and the web crawlers. - View Dependent Claims (11, 12, 13, 14, 15, 16, 17, 18)
-
-
19. A computer program product for use in conjunction with a computer system, the computer program product comprising a computer readable storage medium and a computer program mechanism for limiting requests to a web host by multiple competing web crawlers of a search engine, wherein the computer program mechanism is embedded in the computer readable storage medium, the computer program mechanism comprising:
-
instructions for creating a lease between the web host and a requesting web crawler, the lease including an identity of the requesting web crawler, an identity of the web host, a load capacity allocated to the requesting web crawler and a lease update time prior to a lease expire time at which the lease expires unless the lease is extended;
wherein the web host has a specified maximum allowed load level comprising a predefined number of web crawler download requests per unit time, and the load capacity allocated to the requesting web crawler comprises a specified number of download requests per unit time, the instructions for creating a lease including;instructions for limiting the load capacity allocated to the requesting web crawler such that a sum of load capacity allocated to each of the web crawlers having a respective lease with the web host is no greater than the web host'"'"'s maximum allowed load level; instructions for upon arrival of the lease update time and satisfaction of a predefined condition, automatically updating a lease between the web host and the web crawler at the lease update time by granting the web crawler an updated share of the web host'"'"'s maximum allowed load level, the updated lease having an updated lease expire time later than the lease update time; instructions for terminating a lease between the web host and the web crawler at the lease update time if the predefined condition is not satisfied or per the web crawler'"'"'s request; and a plurality of data structures storing information of a plurality of web hosts, a plurality of web crawlers and a plurality of leases between the web hosts and the web crawlers. - View Dependent Claims (20, 21, 22, 23, 24, 25, 26, 27)
-
Specification