Implementing rate controls to limit timeout-based faults
First Claim
1. A computer system comprising the following:
- one or more processors;
system memory;
one or more computer-readable storage media having stored thereon computer-executable instructions that are executable by the one or more processors to cause the computing system to implement rate controls to limit faults detected by timeout and to instantiate the following;
a monitor module that identifies one or more hardware or software components that have a potential to experience a timeout-based failure within a time frame, wherein the timeout-based failure is a failure in which the one or more hardware or software components is unresponsive for a specified time or takes longer to perform a task than the specified time, wherein the specified time is specified by a timeout value;
a component failure module that establishes a number of timeout-based failures the one or more hardware or software components are allowed to suffer during the time frame;
a determining module that determines that the number of timeout-based failures suffered by the one or more hardware or software components within the time frame has exceeded the established number; and
a timeout value adjusting module that increases the timeout value by a specified amount of time to ensure that fewer than or equal to the established number of timeout-based failures occur within the time frame.
3 Assignments
0 Petitions
Accused Products
Abstract
Embodiments are directed to implementing rate controls to limit faults detected by timeout and to learning and adjusting an optimal timeout value. In one scenario, a computer system identifies cloud components that have the potential to fail within a time frame that is specified by a timeout value. The computer system establishes a number of components that are allowed to fail during the time frame specified by the timeout value and further determines that the number of component failures within the time frame specified by the timeout value has exceeded the established number of components that are allowed to fail. In response, the computer system increases the timeout value by a specified amount of time to ensure that fewer than or equal to the established number of components fail within the time frame specified by the timeout value.
16 Citations
20 Claims
-
1. A computer system comprising the following:
-
one or more processors; system memory; one or more computer-readable storage media having stored thereon computer-executable instructions that are executable by the one or more processors to cause the computing system to implement rate controls to limit faults detected by timeout and to instantiate the following; a monitor module that identifies one or more hardware or software components that have a potential to experience a timeout-based failure within a time frame, wherein the timeout-based failure is a failure in which the one or more hardware or software components is unresponsive for a specified time or takes longer to perform a task than the specified time, wherein the specified time is specified by a timeout value; a component failure module that establishes a number of timeout-based failures the one or more hardware or software components are allowed to suffer during the time frame; a determining module that determines that the number of timeout-based failures suffered by the one or more hardware or software components within the time frame has exceeded the established number; and a timeout value adjusting module that increases the timeout value by a specified amount of time to ensure that fewer than or equal to the established number of timeout-based failures occur within the time frame. - View Dependent Claims (2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14)
-
-
15. A computer system comprising the following:
-
one or more processors; system memory; one or more computer-readable storage media having stored thereon computer-executable instructions that are executable, by the one or more processors to cause, the computing system to perform learning and adjusting of a timeout value and to instantiate, the following; a monitor module that monitors a number of timeout-based failures of one or more hardware or software components that occur due to timeout during a specified timeframe, the timeouts being defined by the timeout value, and wherein the timeout value is based on monitored time distributions for at least one of the following;
application deployments, application updates, virtual machine migrations or node power-downs;a determining module that determines that the timeout value is too high or too low based on the determined number of timeout-based failures that occurred due to timeout during the specified timeframe; and a timeout value adjusting module that adjusts the timeout value to ensure that fewer than or equal to a specified number of timeout-based failures occur during the specified timeframe. - View Dependent Claims (16, 17, 18, 19)
-
-
20. A computer system comprising the following:
-
one or more processors; system memory; one or more computer-readable storage media having stored thereon computer-executable instructions that are executable, by the one or more processors to cause the computing system to perform a method for learning and adjusting a timeout value and to instantiate the following; a monitor module that monitors one or more hardware or software components for a number of hardware or software timeout-based failures of the one or more hardware or software components that occur due to timeout during a specified timeframe, each of the timeout-based failures comprising a failure in which the one or more hardware or software components is unresponsive for a specified time or takes longer to perform a task than the specified time, wherein the specified time is by the timeout value; a determining module that determines that the timeout value is too high or too low based on the determined number of failures that occurred due to timeout during the specified timeframe; and a timeout value adjusting module that adjusts the timeout value to ensure that fewer than or equal to a specified number of failures occur during the specified timeframe; and wherein the computer system adjusts a number of retries that are permitted to occur for the one or more hardware or software components, each retry comprising an attempted restart for a corresponding hardware or software component.
-
Specification