Automated adaptive baselining and thresholding method and system
First Claim
1. A fault detection system which monitors a plurality of attributes of a monitored system, comprising:
- a threshold comparator configured with a current normal threshold associated with an attribute of interest which receives a metric corresponding to said attribute of interest, compares said metric to said current normal threshold, and generates an event notification if said metric is not within a limit defined by said current normal threshold, said attribute being one of said plurality of attributes of said monitored system;
a statistical analyzer, coupled to receive said metric, which calculates a baseline associated with said attribute of interest based on a relevant subset of said metric and a set of previously collected metrics corresponding to said attribute of interest;
a threshold processor coupled to receive said baseline which calculates a new current normal threshold associated with said attribute of interest in realtime based on said baseline; and
a threshold implementor coupled to receive said new current normal threshold which reconfigures said threshold comparator in realtime from said current normal threshold to said new current normal threshold.
4 Assignments
0 Petitions
Accused Products
Abstract
A system and method for automatically constructing a baseline for an attribute of a monitored system, calculating a threshold based on the constructed baseline, and feeding the threshold back into the monitored system is presented. In accordance with the invention, a metric corresponding to an attribute of interest of a monitored system is extracted and compared with a current normal threshold associated with the attribute. An event notification is generated if the extracted metric is not within a limit defined by the current normal threshold. A baseline is calculated based on a relevant subset of extracted metrics, from which a new current normal threshold is calculated. The current normal threshold is reconfigured with the new current normal threshold. In preferred embodiments, an alarm is generated if one or more event notifications meet the conditions of pre-specified rules. Newly calculated current normal thresholds are limited to a service level limit which defines a boundary of the acceptable level of operation the attribute, and a service level exception is generated, if the newly calculated current normal threshold is not within that limit. Reports are generated which summarize the performance of the monitored attributes, indicate which monitored attributes are out-of-control, and prioritize the order in which out-of-control attributes receive available engineering resources.
213 Citations
20 Claims
-
1. A fault detection system which monitors a plurality of attributes of a monitored system, comprising:
-
a threshold comparator configured with a current normal threshold associated with an attribute of interest which receives a metric corresponding to said attribute of interest, compares said metric to said current normal threshold, and generates an event notification if said metric is not within a limit defined by said current normal threshold, said attribute being one of said plurality of attributes of said monitored system;
a statistical analyzer, coupled to receive said metric, which calculates a baseline associated with said attribute of interest based on a relevant subset of said metric and a set of previously collected metrics corresponding to said attribute of interest;
a threshold processor coupled to receive said baseline which calculates a new current normal threshold associated with said attribute of interest in realtime based on said baseline; and
a threshold implementor coupled to receive said new current normal threshold which reconfigures said threshold comparator in realtime from said current normal threshold to said new current normal threshold. - View Dependent Claims (2, 3, 4, 5, 6, 7, 8, 9, 10)
a data collector which extracts a metric corresponding to an attribute of interest from a monitored system.
-
-
3. A fault detection system in accordance with claim 1, comprising:
- an event processor coupled to receive said event notification which generates an alarm.
-
4. A fault detection system in accordance with claim 3, wherein:
said event processor comprises a rules filter which filters said event notification according to at least one rule, said at least one rule defining a pre-determined condition on which to generate said alarm.
-
5. A fault detection system in accordance with claim 1, comprising:
a report generator operable to identify, from said event notification, one or more of said attributes which are adversely affecting performance of said monitored system.
-
6. A fault detection system in accordance with claim 5, wherein:
said report generator is configured to prioritize said one or more attributes of said monitored system in order of those which are adversely affecting said performance of said monitored system the most.
-
7. A fault detection system in accordance with claim 1, comprising:
a sanity checker coupled to said threshold processor which limits said new current normal threshold to a service level limit which defines a boundary of an acceptable level of operation if said new current normal threshold is not within said service level limit.
-
8. A fault detection system in accordance with claim 7, comprising:
a service level exception generator which generates a service level exception if said new current normal threshold is limited to said service level limit.
-
9. A fault detection system in accordance with claim 8, comprising:
a report generator operable to identify, from said service level exception, one or more of said attributes which are adversely affecting performance of said monitored system.
-
10. A fault detection system in accordance with claim 9, wherein:
said report generator is configured to prioritize said one or more attributes of said monitored system in order of those which are adversely affecting said performance of said monitored system the most.
-
11. A method for identifying problem areas of a monitored system, comprising the steps of:
-
extracting a metrics corresponding to an attribute of interest from said monitored system;
comparing said extracted metric with a current normal threshold associated with said attribute of interest;
providing an event notification if said extracted metric is not within a limit defined by said current normal threshold;
calculating a baseline based on a relevant subset of said extracted metric and a set of previously collected extracted metrics;
calculating a new current normal threshold associated with said attribute of interest in realtime based on said baseline; and
reconfiguring said current normal threshold in realtime with said new current normal threshold. - View Dependent Claims (12, 13, 14, 15, 16, 17, 18, 19, 20)
extracting said metric corresponding to an attribute of interest from said monitored system.
-
-
13. A method in accordance with claim 11, comprising the step of:
generating an alarm in response to said event notification.
-
14. A method in accordance with claim 13, comprising the steps of:
filtering said event notification according to at least one rule, said at least one rule defining a pre-determined condition on which to generate said alarm.
-
15. A method in accordance with claim 11, comprising the steps of:
identifying, from said event notification, one or more of said attributes which are adversely affecting performance of said monitored system.
-
16. A method in accordance with claim 15, comprising the steps of:
prioritizing said one or more attributes of said monitored system in order of those which are adversely affecting said performance of said monitored system the most.
-
17. A method in accordance with claim 11, comprising the step of:
limiting said new current normal threshold to a service level limit which defines a boundary of an acceptable level of operation if said new current normal threshold is not within said service level limit.
-
18. A method in accordance with claim 17, comprising the step of:
generating a service level exception if said new current normal threshold is limited to said service level limit.
-
19. A method in accordance with claim 18, comprising the steps of:
identifying, from said service level exception, one or more attributes which are adversely affecting performance of said monitored system.
-
20. A method in accordance with claim 19, wherein:
prioritizing said one or more attributes of said monitored system in order of those which are adversely affecting said performance of said monitored system the most.
Specification