System, method and computer program product for optimized root cause analysis
First Claim
1. A root cause analysis optimization method, comprising:
- at a performance monitoring system, identifying a primary alarm metric and one or more probable cause metrics involved in producing a list of root cause analysis (RCA) candidates, wherein each of the RCA candidates is associated with an RCA score that indicates a level of correlation between each RCA candidate and the primary alarm metric;
determining data points to use from the primary alarm metric and each probable cause metric based on an alarm time of the primary alarm metric;
validating the data points between the primary alarm metric and each probable cause metric to remove discrepancies by aligning the data points in time to compute a correlation coefficient;
applying a data correlation algorithm to the validated data points to compute the correlation coefficient between primary and secondary aligned data points;
calculating a confidence factor by computing a data point ratio between the primary and secondary aligned data points using actual data points and theoretical data points;
adjusting the level of correlation of the validated data points by adjusting the correlation coefficient using the confidence factor including the data point ratio;
adjusting the RCA scores of the RCA candidates based on the computed correlation coefficient including using the adjusted correlation coefficient; and
presenting to a user an optimized RCA candidate list listing a fraction of the RCA candidates sorted based the adjusted RCA score.
11 Assignments
0 Petitions
Accused Products
Abstract
Embodiments disclosed herein can significantly optimize a root cause analysis and substantially reduce the overall time needed to isolate the root cause or causes of service degradation in an IT environment. By building on the ability of an abnormality detection algorithm to correlate an alarm with one or more events, embodiments disclosed herein can apply data correlation to data points collected within a specified time window by data metrics involved in the generation of the alarm and the event(s). The level of correlation between the primary metric and the probable cause metrics may be adjusted using the ratio between theoretical data points and actual points. The final Root Cause Analysis score may be modified depending upon the adjusted correlation value and presented for user review through a user interface.
-
Citations
20 Claims
-
1. A root cause analysis optimization method, comprising:
-
at a performance monitoring system, identifying a primary alarm metric and one or more probable cause metrics involved in producing a list of root cause analysis (RCA) candidates, wherein each of the RCA candidates is associated with an RCA score that indicates a level of correlation between each RCA candidate and the primary alarm metric; determining data points to use from the primary alarm metric and each probable cause metric based on an alarm time of the primary alarm metric; validating the data points between the primary alarm metric and each probable cause metric to remove discrepancies by aligning the data points in time to compute a correlation coefficient; applying a data correlation algorithm to the validated data points to compute the correlation coefficient between primary and secondary aligned data points; calculating a confidence factor by computing a data point ratio between the primary and secondary aligned data points using actual data points and theoretical data points; adjusting the level of correlation of the validated data points by adjusting the correlation coefficient using the confidence factor including the data point ratio; adjusting the RCA scores of the RCA candidates based on the computed correlation coefficient including using the adjusted correlation coefficient; and presenting to a user an optimized RCA candidate list listing a fraction of the RCA candidates sorted based the adjusted RCA score. - View Dependent Claims (2, 3, 4, 5, 6, 7, 8, 9, 10, 11)
-
-
12. A computer program product comprising one or more non-transitory computer-readable storage media storing computer instructions translatable by one or more processors of a computer system to perform:
-
identifying a primary alarm metric and one or more probable cause metrics involved in producing a list of root cause analysis (RCA) candidates, wherein each of the RCA candidates is associated with an RCA score that indicates a level of correlation between each RCA candidate and the primary alarm metric; determining data points to use from the primary alarm metric and each probable cause metric based on an alarm time of the primary alarm metric; validating the data points between the primary alarm metric and each probable cause metric to remove discrepancies by aligning the data points in time to compute a correlation coefficient; applying a data correlation algorithm to the validated data points to compute the correlation coefficient between primary and secondary aligned data points; calculating a confidence factor by computing a data point ratio between the primary and secondary aligned data points using actual data points and theoretical data points; adjusting the level of correlation of the validated data points by adjusting the correlation coefficient using the confidence factor including the data point ratio; adjusting the RCA scores of the RCA candidates based on the computed correlation coefficient including using the adjusted correlation coefficient; and presenting to a user an optimized RCA candidate list listing a fraction of the RCA candidates sorted based the adjusted RCA score. - View Dependent Claims (13, 14, 15, 16, 17)
-
-
18. A system, comprising:
-
one or more processors; and one or more computer-readable storage media storing computer instructions translatable by the one or more processors to perform; identifying a primary alarm metric and one or more probable cause metrics involved in producing a list of root cause analysis (RCA) candidates, wherein each of the RCA candidates is associated with an RCA score that indicates a level of correlation between each RCA candidate and the primary alarm metric; determining data points to use from the primary alarm metric and each probable cause metric based on an alarm time of the primary alarm metric; validating the data points between the primary alarm metric and each probable cause metric to remove discrepancies by aligning the data points in time to compute a correlation coefficient; applying a data correlation algorithm to the validated data points to compute the correlation coefficient between primary and secondary aligned data points; calculating a confidence factor by computing a data point ratio between the primary and secondary aligned data points using actual data points and theoretical data points; adjusting the level of correlation of the validated data points by adjusting the correlation coefficient using the confidence factor including the data point ratio; adjusting the RCA scores of the RCA candidates based on the computed correlation coefficient including using the adjusted correlation coefficient; and presenting to a user an optimized RCA candidate list listing a fraction of the RCA candidates sorted based the adjusted RCA score. - View Dependent Claims (19, 20)
-
Specification