PERFORMANCE METRIC COLLECTION AND AUTOMATED ANALYSIS
First Claim
1. A method for performing system metric analysis, comprising the steps of:
- collecting metric data representing system operations from a plurality of system sources, determining a dynamic metric threshold range for each metric over successive time periods, indicating a metric alarm event by generating a metric threshold alarm indicator when a corresponding metric value deviates outside the dynamic threshold range, generating an alarm severity score for each metric alarm event, and performing a root cause analysis identifying a basic cause of a system alarm condition by at least one of correlation of grouped metrics with alarm conditions, and forensic analysis of selected secondary forensic data items recorded upon the occurrence of at least selected alarm conditions.
21 Assignments
0 Petitions
Accused Products
Abstract
A metric monitoring and analysis system including dynamic sampling agents located in monitored system elements and a service management platform. Each sampling agent includes a data adapter collecting metric data in a common format, a threshold generator for determining dynamic metric threshold ranges, an alarm detector generating an indicator when a metric deviates outside a dynamic threshold range or a static threshold, and a deviation tracker generating an alarm severity scores. The service platform includes an alarm analyzer identifying root causes of system alarm conditions by correlation of grouped metrics or forensic analysis of temporally or statistically correlated secondary forensic data or data items from a service model of the system.
171 Citations
18 Claims
-
1. A method for performing system metric analysis, comprising the steps of:
-
collecting metric data representing system operations from a plurality of system sources, determining a dynamic metric threshold range for each metric over successive time periods, indicating a metric alarm event by generating a metric threshold alarm indicator when a corresponding metric value deviates outside the dynamic threshold range, generating an alarm severity score for each metric alarm event, and performing a root cause analysis identifying a basic cause of a system alarm condition by at least one of correlation of grouped metrics with alarm conditions, and forensic analysis of selected secondary forensic data items recorded upon the occurrence of at least selected alarm conditions. - View Dependent Claims (2, 3, 4, 5, 6, 7, 8, 9, 10)
-
-
11. A metric monitoring and analysis system, comprising:
-
a plurality of dynamic sampling agents, each dynamic sampling agent being located in a system element containing a metric of interest for monitoring and analysis and including at least one plurality of data collectors for collecting metric data representing system operations from a plurality of system sources, a threshold generator for determining a dynamic metric threshold range for each metric over successive time periods, an alarm condition detector indicating a metric alarm event by generating a metric threshold alarm indicator when a corresponding metric value deviates outside the dynamic threshold range, and a deviation tracker for generating an alarm severity score for each metric alarm event, and a single service management platform receiving alarm indicators and severity scores from the dynamic sampling agents and including an alarm analyzer performing a root cause analysis identifying a basic cause of a system alarm condition by at least one of correlation of grouped metrics with alarm conditions, and forensic analysis of selected secondary forensic data items recorded upon the occurrence of at least one or more selected alarm conditions. - View Dependent Claims (12, 13, 14, 15, 16, 17, 18)
-
Specification