Grouping failures to infer common causes
First Claim
1. A computer-executable method, comprising:
- in a system of interrelated components, monitoring numerous components over time to detect a failure status of each of the numerous components with respect to intervals of the time;
for each interval of the time, receiving a failure indication for each component that is in failure during that interval;
forming one or more groups of the received failure indications, each group inferring a cause of failure common to the group;
arranging the received failure indications into a first matrix representing components versus time, the first matrix indicating the time intervals during which each component is in failure;
arranging the inferred causes of failure into a second matrix representing inferred causes of failure versus time;
correlating the components in the first matrix to the inferred causes of failure in the second matrix via a 3-dimensional intermediate matrix representing time slices, each time slice containing probability-based hypothetical groupings of the failure indications received at the time of the time slice and corresponding inferred causes of failure; and
ranking candidate values for each time slice to distinguish more probable hypothetical groupings of the failure indications from less probable hypothesized groupings of the failure indications.
2 Assignments
0 Petitions
Accused Products
Abstract
Systems and methods establish groups among numerous indications of failure in order to infer a cause of failure common to each group. In one implementation, a system computes the groups such that each group has the maximum likelihood of resulting from a common failure. Indications of failure are grouped by probability, even when a group'"'"'s inferred cause of failure is not directly observable in the system. In one implementation, related matrices provide a system for receiving numerous health indications from each of numerous autonomous systems connected with the Internet. A correlational matrix links input (failure symptoms) and output (known or unknown root causes) through probability-based hypothetical groupings of the failure indications. The matrices are iteratively refined according to self-consistency and parsimony metrics to provide most likely groupings of indicators and most likely causes of failure.
-
Citations
19 Claims
-
1. A computer-executable method, comprising:
-
in a system of interrelated components, monitoring numerous components over time to detect a failure status of each of the numerous components with respect to intervals of the time; for each interval of the time, receiving a failure indication for each component that is in failure during that interval; forming one or more groups of the received failure indications, each group inferring a cause of failure common to the group; arranging the received failure indications into a first matrix representing components versus time, the first matrix indicating the time intervals during which each component is in failure; arranging the inferred causes of failure into a second matrix representing inferred causes of failure versus time; correlating the components in the first matrix to the inferred causes of failure in the second matrix via a 3-dimensional intermediate matrix representing time slices, each time slice containing probability-based hypothetical groupings of the failure indications received at the time of the time slice and corresponding inferred causes of failure; and ranking candidate values for each time slice to distinguish more probable hypothetical groupings of the failure indications from less probable hypothesized groupings of the failure indications. - View Dependent Claims (2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14)
-
-
15. A system, comprising:
-
an input matrix to arrange failure indications received from sensors monitoring a network with respect to time; an output matrix to arrange inferred causes of failure to be associated with the failure indications with respect to time; and a 3-dimensional intermediate matrix to associate the failure indications with the inferred causes of failure, the 3-dimensional intermediate matrix representing time slices, each time slice containing probability-based hypothetical groupings of the failure indications received at the time of the time slice and corresponding inferred causes of failure. - View Dependent Claims (16, 17, 18)
-
-
19. A system, comprising:
-
a first component for receiving numerous health indications from each of numerous components connected with the Internet; and a second component for grouping failure incidents among the health indications into groups, such that each group implies a cause of failure common to the group; a third component for arranging the received failure indications into a first matrix representing components versus time, the first matrix indicating the time intervals during which each component is in failure; a fourth component for arranging the inferred causes of failure into a second matrix representing inferred causes of failure versus time; a fifth component for correlating the components in the first matrix to the interred causes of failure in the second matrix via a 3-dimensional intermediate matrix representing time slices, each time slice containing probability-based hypothetical groupings of the failure indications received at the time of the time slice and corresponding inferred causes of failure; and a sixth component for ranking candidate values for each time slice to distinguish more probable hypothetical groupings of the failure indications from less probable hypothesized groupings of the failure indications.
-
Specification