Apparatus and method for event correlation and problem reporting
First Claim
1. A method for detecting problems in a system which generates a plurality of symptoms, the method comprising the steps of:
- (1) providing a computer-accessible codebook comprising a matrix of values each corresponding to a mapping between one of said plurality of symptoms and a likely problem in said system;
(2) monitoring said plurality of symptoms generated by said system over time;
(3) decoding, through the use of a computer, said monitored symptoms into one or more of said likely problems by determining a mismatch measure between one or more of said values in said codebook and one or more of said monitored symptoms; and
(4) generating a report comprising said one or more likely problems decoded from said codebook.
10 Assignments
0 Petitions
Accused Products
Abstract
An apparatus and method is provided for efficiently determining the source of problems in a complex system based on observable events. By splitting the problem identification process into two separate activities of (1) generating efficient codes for problem identification and (2) decoding the problems at runtime, the efficiency of the problem identification process is significantly increased. Various embodiments of the invention contemplate creating a causality matrix which relates observable symptoms to likely problems in the system, reducing the causality matrix into a minimal codebook by eliminating redundant or unnecessary information, monitoring the observable symptoms, and decoding problems by comparing the observable symptoms against the minimal codebook using various best-fit approaches. The minimal codebook also identifies those observable symptoms for which the greatest benefit will be gained if they were monitored as compared to others.
By defining a distance measure between symptoms and codes in the codebook, the invention can tolerate a loss of symptoms or spurious symptoms without failure. Changing the radius of the codebook allows the ambiguity of problem identification to be adjusted easily. The invention also allows probabilistic and temporal correlations to be monitored. Due to the degree of data reduction prior to runtime, extremely large and complex systems involving many observable events can be efficiently monitored with much smaller computing resources than would otherwise be possible.
-
Citations
76 Claims
-
1. A method for detecting problems in a system which generates a plurality of symptoms, the method comprising the steps of:
-
(1) providing a computer-accessible codebook comprising a matrix of values each corresponding to a mapping between one of said plurality of symptoms and a likely problem in said system;
(2) monitoring said plurality of symptoms generated by said system over time;
(3) decoding, through the use of a computer, said monitored symptoms into one or more of said likely problems by determining a mismatch measure between one or more of said values in said codebook and one or more of said monitored symptoms; and
(4) generating a report comprising said one or more likely problems decoded from said codebook. - View Dependent Claims (2, 3, 4, 5, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20)
-
- 6. The method of claim wherein step (1) comprises the step of specifying each of said value as a probability, said probability reflecting a likelihood that a corresponding symptom was caused by a corresponding problem.
-
21. A method for detecting problems in a system which generates a plurality of symptoms, the method comprising the steps of:
-
(1) generating a causality matrix comprising a first matrix of values each corresponding to a mapping between one of said plurality of symptoms and a likely problem in said system;
(2) reducing said causality matrix into a codebook comprising a second matrix of values fewer in number than said first matrix of values by eliminating duplicative sets of values from said first matrix;
(3) monitoring said plurality of symptoms generated by said system over time;
(4) decoding, through the use of a computer, said monitored symptoms into one or more of said likely problems by determining a mismatch measure between one or more of said values in said codebook and one or more of said monitored symptoms, and (5) reporting said one or more likely problems decoded from said codebook. - View Dependent Claims (22, 23, 24, 25, 26, 27)
-
-
28. A method of generating a codebook for use in a process of detecting problems in a system which generates a plurality of symptoms, the method comprising the steps of:
-
(1) preparing a causality matrix comprising a matrix of values each corresponding to a mapping between one of said plurality of symptoms and a likely problem in said system;
(2) making said causality matrix well-formed by deleting redundant sets of values from said matrix of values;
(3) selecting a radius corresponding to a desired level of problem identification;
(4) generating, through the use of a computer, an optimal codebook from said well-formed causality matrix by selecting values from said well-formed causality matrix based on comparisons with said radius; and
(5) storing said optimal codebook in a computer storage device. - View Dependent Claims (29, 30, 31, 32)
-
-
33. A method of generating a codebook for use in a process of detecting problems in a system which generates a plurality of symptoms, the method comprising steps of:
-
(1) preparing a causality graph comprising a plurality of nodes each corresponding to a problem, or a symptom, and a plurality of dined edges each corresponding to a causal relation between two or more of said nodes;
(2) making said causality graph well-formed by deleting redundant nodes;
(3) selecting a radius corresponding to a desired level of problem identification;
(4) generating, through the use of a computer, an optimal codebook from said well-formed causality graph by selecting symptom nodes based on comparisons with said radius; and
(5) storing said optimal codebook, in a computer storage device. - View Dependent Claims (34, 35, 36)
-
-
37. Apparatus for detecting problems in a system which generates a plurality of symptoms, the apparatus comprising:
-
a storage device for storing a codebook comprising a matrix of values each corresponding to a mapping between one of said plurality of symptoms and a likely problem in said system;
monitoring means for monitoring said plurality of symptoms generated by said system over time;
decoding means for reading said values from said codebook and decoding said monitored symptoms into one or more of said likely problems by determining a mismatch measure between one or more of said values read from said codebook and one or more of said monitored symptoms; and
generating means for generating a report comprising said one or more likely problems decoded from said codebook. - View Dependent Claims (38, 39, 40, 41, 42, 43, 44, 45, 46, 47, 48, 49, 50, 51, 52, 53, 54, 55, 56)
-
-
57. Apparatus for detecting problems in a system which generates a plurality of symptoms, the apparatus comprising:
-
generating means for generating a causality matrix comprising a first matrix of values each corresponding to a mapping between one of said plurality of symptoms and a likely problem in said system;
reducing means for reducing said causality matrix into a computer-accessible codebook comprising a second matrix of values fewer in number than said first matrix of values by eliminating duplicative sets of values from said first matrix;
monitoring means for monitoring said plurality of symptoms generated by said system over time through the use of a computer;
decoding means for decoding said monitored symptoms into one or more of said likely problems by determining a mismatch measure between one or more of said values in said codebook and one or more of said monitored symptoms; and
a report generator for reporting said one or more likely problems decoded from said codebook. - View Dependent Claims (58, 59, 60, 61, 62, 63)
-
-
64. Apparatus for generating a codebook for use in detecting problems in a system which generates a plurality of symptoms, the apparatus comprising:
-
preparing means for preparing a causality matrix comprising a matrix of values each corresponding to a mapping between one of said plurality of symptoms and a likely problem in said system;
means for making said causality matrix well-formed by deleting redundant sets of values from said matrix of values;
inputting means for inputting a radius corresponding to a desired level of problem identification;
generating means for generating a computer-accessible optimal codebook from said well-formed causality matrix by selecting values from said well-formed causality matrix based on comparisons with said radius; and
a storage device for storing said computer-accessible optimal codebook. - View Dependent Claims (65, 66, 67, 68)
-
-
69. Apparatus for generating a codebook for use in detecting problems in a system which generates a plurality of symptoms, the apparatus comprising:
-
preparing means for preparing a causality graph comprising a plurality of nodes each corresponding to a problem or a symptom, and a plurality of directed edges each corresponding to a causal relation between two or more of said nodes;
means for making said causality graph well-formed by deleting redundant nodes;
specifying means for specifying a radius corresponding to a desired level of problem identification in said system;
generating means for generating, through the use of a computer, an optimal codebook from said well-formed causality graph by selecting symptom nodes based on comparisons with said radius; and
a computer storage device for storing said optimal codebook. - View Dependent Claims (70, 71, 72, 73)
-
-
74. A method of preparing a data structure for use in identifying problems in a system having a plurality of components, said system generating a plurality of observable events, the method comprising the steps of:
-
(1) preparing first compilable statements which define causal relationships between said observable events and likely problems in said system;
(2) preparing second compilable statements which define propagation properties of said observable events among said components of said system;
(3) preparing a configuration specification which defines relationships among said components of said system;
(4) translating, through the use of a computer, said first and second compilable statements into said data structure by determining a causality closure of said observable events based on said relationships among components of said system and said propagation properties; and
(5) storing said data structure in a computer storage device. - View Dependent Claims (75, 76)
-
Specification