Grouping Failures to Infer Common Causes
First Claim
1. A computer-executable method, comprising:
- in a system of interrelated components, monitoring numerous components over time to detect a failure status of each of the numerous components with respect to intervals of the time;
for each interval of the time, receiving a failure indication for each component that is in failure during that interval; and
forming one or more groups of the received failure indications, each group inferring a cause of failure common to the group.
2 Assignments
0 Petitions
Accused Products
Abstract
Systems and methods establish groups among numerous indications of failure in order to infer a cause of failure common to each group. In one implementation, a system computes the groups such that each group has the maximum likelihood of resulting from a common failure. Indications of failure are grouped by probability, even when a group'"'"'s inferred cause of failure is not directly observable in the system. In one implementation, related matrices provide a system for receiving numerous health indications from each of numerous autonomous systems connected with the Internet. A correlational matrix links input (failure symptoms) and output (known or unknown root causes) through probability-based hypothetical groupings of the failure indications. The matrices are iteratively refined according to self-consistency and parsimony metrics to provide most likely groupings of indicators and most likely causes of failure.
50 Citations
20 Claims
-
1. A computer-executable method, comprising:
-
in a system of interrelated components, monitoring numerous components over time to detect a failure status of each of the numerous components with respect to intervals of the time; for each interval of the time, receiving a failure indication for each component that is in failure during that interval; and forming one or more groups of the received failure indications, each group inferring a cause of failure common to the group. - View Dependent Claims (2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15)
-
-
16. A system, comprising:
-
an input matrix to arrange with respect to time, failure indications received from sensors monitoring a network; an output matrix to arrange with respect to time, inferred causes of failure to be associated with the failure indications; and a 3-dimensional intermediate matrix to associate the failure indications with the inferred causes of failure, based on probability. - View Dependent Claims (17, 18, 19)
-
-
20. A system, comprising:
-
means for receiving numerous health indications from each of numerous components connected with the Internet; and means for grouping failure incidents among the health indications into groups, such that each group implies a cause of failure common to the group.
-
Specification