AUTOMATIC ROOT CAUSE ANALYSIS OF PERFORMANCE PROBLEMS USING AUTO-BASELINING ON AGGREGATED PERFORMANCE METRICS
First Claim
1. A computer-implemented method for detecting anomalous behavior in a distributed system, comprising:
- receiving data from a plurality of subsystems in the distributed system which identifies metrics for the plurality of subsystems when the plurality of subsystems perform processing for a plurality of execution paths in the distributed system;
automatically establishing baseline metrics for the subsystems, responsive to the received data;
receiving data from particular subsystems of the plurality of subsystems which identifies metrics for the particular subsystems when the particular subsystems perform processing for a particular execution path in the distributed system;
determining if the metrics for the particular subsystems are anomalous based on the baseline metrics; and
reporting, responsive to the determining.
2 Assignments
0 Petitions
Accused Products
Abstract
Anomalous behavior in a distributed system is automatically detected. Metrics are gathered for transactions, subsystems and/or components of the subsystems. The metrics can identify response times, error counts and/or CPU loads, for instance. Baseline metrics and associated deviation ranges are automatically determined and can be periodically updated. Metrics from specific transactions are compared to the baseline metrics to determine if an anomaly has occurred. A drill down approach can be used so that metrics for a subsystem are not examined unless the metrics for an associated transaction indicate an anomaly. Further, metrics for a component, application which includes one or more components, or process which includes one or more applications, are not examined unless the metrics for an associated subsystem indicate an anomaly. Multiple subsystems can report the metrics to a central manager, which can correlate the metrics to transactions using transaction identifiers or other transaction context data.
113 Citations
36 Claims
-
1. A computer-implemented method for detecting anomalous behavior in a distributed system, comprising:
-
receiving data from a plurality of subsystems in the distributed system which identifies metrics for the plurality of subsystems when the plurality of subsystems perform processing for a plurality of execution paths in the distributed system; automatically establishing baseline metrics for the subsystems, responsive to the received data; receiving data from particular subsystems of the plurality of subsystems which identifies metrics for the particular subsystems when the particular subsystems perform processing for a particular execution path in the distributed system; determining if the metrics for the particular subsystems are anomalous based on the baseline metrics; and reporting, responsive to the determining. - View Dependent Claims (2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16)
-
-
17. A computer-implemented method for detecting anomalous behavior at a host, comprising:
-
providing metrics for a plurality of components of at least one host, the plurality of components are associated with a plurality of execution paths; automatically establishing baseline metrics for the plurality of components based on the provided metrics for the plurality of components; providing metrics for particular components of the at least one host, the particular components are in a particular execution path; determining if the metrics for the particular components are anomalous based on the baseline metrics; and reporting, responsive to the determining. - View Dependent Claims (18, 19, 20, 21, 22, 23, 24, 25, 26, 27)
-
-
28. At least one processor readable storage device having processor readable code embodied thereon for programming at least one processor to perform a method, the method comprising:
-
receiving data from a plurality of subsystems in the distributed system which identifies metrics for the plurality of subsystems when the plurality of subsystems perform processing for a plurality of execution paths in the distributed system; automatically establishing baseline metrics for the subsystems, responsive to the received data; receiving data from particular subsystems of the plurality of subsystems which identifies metrics for the particular subsystems when the particular subsystems perform processing for a particular execution path in the distributed system; determining if the metrics for the particular subsystems are anomalous based on the baseline metrics; and reporting, responsive to the determining. - View Dependent Claims (29, 30, 31, 32)
-
-
33. At least one processor readable storage device having processor readable code embodied thereon for programming at least one processor to perform a method, the method comprising:
-
providing metrics for a plurality of components of at least one host, the plurality of components are associated with a plurality of execution paths; automatically establishing baseline metrics for the plurality of components based on the provided metrics for the plurality of components; providing metrics for particular components of the at least one host, the particular components are in a particular execution path; determining if the metrics for the particular components are anomalous based on the baseline metrics; and reporting, responsive to the determining. - View Dependent Claims (34, 35, 36)
-
Specification