Automated and adaptive threshold setting
First Claim
Patent Images
1. A method, comprising:
- in a computer system that includes at least one component having a performance threshold, transmitting alerts from the computer system to a monitoring system when a performance measure of the at least one component crosses the performance threshold;
detecting, using the monitoring system, violations of a Service Level Objective (SLO) defined for a service running on the computer system;
based on the alerts transmitted to the monitoring system when the performance threshold is crossed, producing a model that predicts future violations of the SLO responsively to the alerts and has a prediction reliability; and
automatically adjusting the performance threshold of the component responsively to the model,wherein producing the model comprises fitting a first sequence comprising historical values of the violations of the SLO and a second sequence comprising historical values of the performance threshold, and wherein automatically adjusting the performance threshold of the component comprises calculating an updated threshold value based on the fitted sequences and a pre-specified prediction reliability.
2 Assignments
0 Petitions
Accused Products
Abstract
A method for managing a computer system includes monitoring first violations of a service level objective (SLO) of a service running on the computer system so as to determine a first statistical behavior of the first violations. Second violations of a component performance threshold of a component of the computer system are monitored so as to determine a second statistical behavior of the second violations. A model that predicts the second statistical behavior based on the first statistical behavior is produced. The component performance threshold is automatically adjusted responsively to the model, so as to improve a prediction of the first violations by the second violations.
63 Citations
26 Claims
-
1. A method, comprising:
-
in a computer system that includes at least one component having a performance threshold, transmitting alerts from the computer system to a monitoring system when a performance measure of the at least one component crosses the performance threshold; detecting, using the monitoring system, violations of a Service Level Objective (SLO) defined for a service running on the computer system; based on the alerts transmitted to the monitoring system when the performance threshold is crossed, producing a model that predicts future violations of the SLO responsively to the alerts and has a prediction reliability; and automatically adjusting the performance threshold of the component responsively to the model, wherein producing the model comprises fitting a first sequence comprising historical values of the violations of the SLO and a second sequence comprising historical values of the performance threshold, and wherein automatically adjusting the performance threshold of the component comprises calculating an updated threshold value based on the fitted sequences and a pre-specified prediction reliability. - View Dependent Claims (2, 3, 4, 5, 6)
-
-
7. Apparatus for managing a computer system that includes at least one component having a performance threshold, the apparatus comprising:
-
an interface, which is coupled to receive from the computer system alerts when the performance measure of the at least one component crosses the performance threshold, and to further receive from the computer system detected violations of a Service Level Objective (SLO) defined for a service running on the computer system; and a processor, which is arranged to produce, based on the alerts received from the computer system when the performance threshold is crossed, a model that predicts future violations of the SLO responsively to the alerts and has a prediction reliability, and to automatically adjust the performance threshold of the component responsively to the model, wherein the processor is arranged to fit a first sequence comprising historical values of the violations of the SLO and a second sequence comprising historical values of the performance threshold in order to produce the model, and to calculate an updated threshold value based on the fitted sequences and a pre-specified prediction reliability. - View Dependent Claims (8, 9, 10, 11, 12)
-
-
13. A method, comprising:
-
in a computer system that includes at least one component having a performance threshold, transmitting alerts from the computer system to a monitoring system when a performance measure of the at least one component crosses the performance threshold; detecting, using the monitoring system, violations of a Service Level Objective (SLO) defined for a service running on the computer system; based on the alerts transmitted to the monitoring system when the performance threshold is crossed, producing a model that predicts future violations of the SLO responsively to the alerts and has a prediction reliability; and automatically adjusting the performance threshold of the component responsively to the model, wherein producing the model comprises assigning a first counter to count a first number of occurrences in which the performance threshold is crossed, a second counter to count a second number of the occurrences in which the performance threshold is not crossed, a third counter to count a third number of the occurrences in which the SLO is violated, and a fourth counter to count a fourth number of the occurrences in which the SLO is not violated, and wherein adjusting the performance threshold comprises setting the performance threshold responsively to the first, second, third and fourth counters. - View Dependent Claims (14, 15, 16, 17)
-
-
18. Apparatus for managing a computer system that includes at least one component having a performance threshold, the apparatus comprising:
-
an interface, which is coupled to receive from the computer system alerts when the performance measure of the at least one component crosses the performance threshold, and to further receive from the computer system detected violations of a Service Level Objective (SLO) defined for a service running on the computer system; and a processor, which is arranged to produce, based on the alerts received from the computer system when the performance threshold is crossed, a model that predicts future violations of the SLO responsively to the alerts and has a prediction reliability, and to automatically adjust the performance threshold of the component responsively to the model, wherein the processor is arranged to produce the model by assigning a first counter to count a first number of occurrences in which the performance threshold is crossed, a second counter to count a second number of the occurrences in which the performance threshold is not crossed, a third counter to count a third number of the occurrences in which the SLO is violated, and a fourth counter to count a fourth number of the occurrences in which the SLO is not violated, and to adjust the performance threshold responsively to the first, second, third and fourth counters. - View Dependent Claims (19, 20, 21, 22)
-
-
23. A method for performing an interactive analysis of a computer system, which includes at least one component having a performance threshold, to devise an information technology solution applicable to the computer system, the method comprising:
-
transmitting alerts from the computer system to a monitoring system when a performance measure of the at least one component crosses the performance threshold; detecting, using the monitoring system, violations of a Service Level Objective (SLO) defined for a service running on the computer system; based on the alerts transmitted to the monitoring system when the performance threshold is crossed, producing a model that predicts future violations of the SLO responsively to the alerts and has a prediction reliability; and automatically adjusting the performance threshold of the component responsively to the model, wherein producing the model comprises fitting a first sequence comprising historical values of the violations of the SLO and a second sequence comprising historical values of the performance threshold, and wherein automatically adjusting the performance threshold of the component comprises calculating an updated threshold value based on the fitted sequences and a pre-specified prediction reliability. - View Dependent Claims (24, 25, 26)
-
Specification