System and method for throttling service requests having non-uniform workloads
First Claim
1. A method, comprising:
- performing by one or more computing devices;
receiving a plurality of service requests at a network-based service;
servicing a portion of the plurality of service requests at the network-based service based, at least in part, on a maximum allowable request rate;
determining respective numbers of units of work performed to service individual requests of the portion of the plurality of service requests at the network-based service, wherein the respective numbers of units of work performed to service the individual requests are non-uniform;
comparing, at the network-based service, an observed rate of the respective numbers of units of work performed to service the portion of the plurality of service requests with respect to a target rate determined based, at least in part, on a user specified throughput to service requests at the network-based service;
based, at least in part, on the comparison, adjusting the maximum allowable request rate; and
subsequent to said adjusting;
receiving one or more additional service requests at the network-based service; and
servicing a portion of the additional service requests at the network-based service based, at least in part, on the adjusted maximum allowable request rate.
0 Assignments
0 Petitions
Accused Products
Abstract
A system that provides services to clients may receive and service requests, various ones of which may require different amounts of work. The system may determine whether it is operating in an overloaded or underloaded state based on a current work throughput rate, a target work throughput rate, a maximum request rate, or an actual request rate, and may dynamically adjust the maximum request rate in response. For example, if the maximum request rate is being exceeded, the maximum request rate may be raised or lowered, dependent on the current work throughput rate. If the target or committed work throughput rate is being exceeded, but the maximum request rate is not being exceeded, a lower maximum request rate may be proposed. Adjustments to the maximum request rate may be made using multiple incremental adjustments. Service request tokens may be added to a leaky token bucket at the maximum request rate.
-
Citations
20 Claims
-
1. A method, comprising:
performing by one or more computing devices; receiving a plurality of service requests at a network-based service; servicing a portion of the plurality of service requests at the network-based service based, at least in part, on a maximum allowable request rate; determining respective numbers of units of work performed to service individual requests of the portion of the plurality of service requests at the network-based service, wherein the respective numbers of units of work performed to service the individual requests are non-uniform; comparing, at the network-based service, an observed rate of the respective numbers of units of work performed to service the portion of the plurality of service requests with respect to a target rate determined based, at least in part, on a user specified throughput to service requests at the network-based service; based, at least in part, on the comparison, adjusting the maximum allowable request rate; and subsequent to said adjusting; receiving one or more additional service requests at the network-based service; and servicing a portion of the additional service requests at the network-based service based, at least in part, on the adjusted maximum allowable request rate. - View Dependent Claims (2, 3, 4, 5, 6, 7)
-
8. A system, comprising:
-
one or more processors; and a memory coupled to the one or more processors and storing program instructions that when executed by the one or more processors cause the one or more processors to; receive a plurality of service requests at a network-based service; service a portion of the plurality of service requests at the network-based service based, at least in part, on a maximum allowable request rate to service requests at the network-based service; determine respective numbers of units of work performed to service individual requests of the portion of the plurality of service requests at the network-based service, wherein the respective numbers of units of work performed to service the individual requests are non-uniform; compare an observed rate of the respective number of units work performed to service the portion of the service requests with respect to a target rate determined, based, at least in part, on a user specified throughput to service requests at the network-based service; based, at least in part, on the comparison, adjust the maximum allowable request rate; and subsequent to the adjust; receive one or more additional service requests at the network-based service; and service a portion of the additional service requests at the network-based service based, at least in part, on the adjusted maximum allowable request rate. - View Dependent Claims (9, 10, 11, 12, 13, 14)
-
-
15. A non-transitory, computer-readable storage medium, storing program instructions that when executed by one or more computing devices cause the one or more computing devices to implement:
-
receiving a plurality of service requests at a network-based service; servicing a portion of the plurality of service requests at the network-based service based, at least in part, on a maximum allowable request rate to service requests at the network-based service; determining respective numbers of units of work performed to service individual requests of the portion of the plurality of service requests at the network-based service, wherein the respective numbers of units of work performed to service the individual requests are non-uniform; comparing, at the network-based service, an observed rate of the respective numbers of units of work performed to service the portion of the service requests with respect to a target rate determined based, at least in part, on a user specified throughput to service requests at the network-based service; based, at least in part, on the comparison, adjusting the maximum allowable request rate; and subsequent to said adjusting; receiving one or more additional service requests at the network-based service; and servicing a portion of the additional service requests at the network-based service based, at least in part, on the adjusted maximum allowable request rate. - View Dependent Claims (16, 17, 18, 19, 20)
-
Specification