Collaborative troubleshooting computer systems using fault tree analysis
First Claim
1. A computer-implemented method for troubleshooting a computer system, comprising:
- retrieving a data structure storing a fault tree analysis describing a fault event detected in the computing system, wherein the fault tree analysis specifies a hierarchical structure specifying one or more symptoms associated with the fault event and one or more root causes associated with the fault event;
retrieving metadata characterizing the fault event, wherein the metadata is supplied by a user-community; and
predicting, based on an evaluation of the fault tree analysis and the metadata, a root cause for the fault event.
7 Assignments
0 Petitions
Accused Products
Abstract
Embodiments of the invention provide techniques for troubleshooting of computer systems using a fault tree analysis. In one embodiment, data parameters describing a status of a system may be monitored to determine the existence of a fault. In the event of a fault, fault tree analysis metadata may be evaluated to attempt to determine a root cause of the fault. If a root cause can be automatically determined, it may be presented to a user in a troubleshooting console, or may be used to trigger an automated corrective action. Alternatively, if a root cause cannot be automatically determined, the user may be presented with additional fault tree analysis metadata and any relevant data parameters in the troubleshooting console, so that the user may determine the root cause of the fault event.
43 Citations
21 Claims
-
1. A computer-implemented method for troubleshooting a computer system, comprising:
-
retrieving a data structure storing a fault tree analysis describing a fault event detected in the computing system, wherein the fault tree analysis specifies a hierarchical structure specifying one or more symptoms associated with the fault event and one or more root causes associated with the fault event; retrieving metadata characterizing the fault event, wherein the metadata is supplied by a user-community; and predicting, based on an evaluation of the fault tree analysis and the metadata, a root cause for the fault event. - View Dependent Claims (2, 3, 4, 5, 6, 7)
-
-
8. A computer useable storage medium having a computer readable program, wherein the computer readable program when executed on a computer causes the computer to perform an operation, comprising:
-
retrieving a data structure storing a fault tree analysis describing a fault event detected in the computing system, wherein the fault tree analysis specifies a hierarchical structure specifying one or more symptoms associated with the fault event and one or more root causes associated with the fault event; retrieving metadata characterizing the fault event, wherein the metadata is supplied by a user-community; and predicting, based on an evaluation of the fault tree analysis and the metadata, a root cause for the fault event. - View Dependent Claims (9, 10, 11, 12, 13, 14)
-
-
15. A system, comprising:
-
a processor; and a memory containing a monitoring program configured to monitor the availability of a networked software application, wherein the monitoring program, when executed on the processor, is configured to; retrieve a data structure storing a fault tree analysis describing a fault event detected in the computing system, wherein the fault tree analysis specifies a hierarchical structure specifying one or more symptoms associated with the fault event and one or more root causes associated with the fault event retrieve metadata characterizing the fault event, wherein the metadata is supplied by a user-community; and predict, based on an evaluation of the fault tree analysis and the metadata, a root cause for the fault event. - View Dependent Claims (16, 17, 18, 19, 20, 21)
-
Specification