System and method for event monitoring and error detection
First Claim
1. An event monitor comprising:
- an event counter for counting occurrences of one or more preselected types of events;
a timer coupled to the event counter for determining the beginning and ending a plurality of time periods to be monitored, each of the time periods having a preselected length of time; and
processing logic coupled to the timer and the event monitor for calculating;
a first value equal to or proportional to a total number of events counted in a current time period;
a second value equal to or proportional to a total number of events counted in all time periods monitored;
a third value equal to or proportional to the difference between a total number of events counted in a current time period and a total number of events counted in a previous time period; and
a numerical indicium of system stability, the indicium equal to a sum of said first, second, and third values.
4 Assignments
0 Petitions
Accused Products
Abstract
An event monitor includes an event counter for counting one or more preselected events detected by an event counter. The events are counted for successive time periods and an indication of system stability is calculated to detect possible error conditions. The indication of system stability is calculated by taking a nonweighted or weighted sum of the total number of events counted in the current time period, the total number of events counted in all time periods monitored, and the difference between the total number of events counted in the current time period and the total number of events counted in the preceding time period. In this manner, the event monitor is able to identify possible error conditions more quickly than systems using rate thresholding so that remedial action may be initiated sooner. In a further aspect, system stability is further assessed by counting the number of time periods in which there is an indicium of stability, and is thus able to quickly identify a potential error condition that has stabilized and for which remedial action is not necessary. In further aspects, methods for event monitoring and error detection are provided.
88 Citations
56 Claims
-
1. An event monitor comprising:
-
an event counter for counting occurrences of one or more preselected types of events;
a timer coupled to the event counter for determining the beginning and ending a plurality of time periods to be monitored, each of the time periods having a preselected length of time; and
processing logic coupled to the timer and the event monitor for calculating;
a first value equal to or proportional to a total number of events counted in a current time period;
a second value equal to or proportional to a total number of events counted in all time periods monitored;
a third value equal to or proportional to the difference between a total number of events counted in a current time period and a total number of events counted in a previous time period; and
a numerical indicium of system stability, the indicium equal to a sum of said first, second, and third values. - View Dependent Claims (2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28, 29)
calculating a rate of event occurrence for the current time period;
determining whether the rate exceeds a preselected rate threshold value; and
if the rate exceeds the preselected rate threshold value, outputting an error warning.
-
-
17. The event monitor of claim 16, the processing logic further:
-
for each time period, determining if the numerical indicium of system stability is the same as the numerical indicium of system stability calculated in an immediately preceding time period; and
counting a number of consecutive time periods in which the numerical indicium of system stability is the same as the numerical indicium of system stability calculated in an immediately preceding time period.
-
-
18. The event monitor of claim 17, the processing logic further:
-
determining whether said number of consecutive time periods is greater than or equal to a preselected threshold value; and
if the number of consecutive time periods exceeds the preselected threshold value, outputting an indication of system stability.
-
-
19. The event monitor of claim 18, wherein outputting an indication of system stability comprises withdrawing a previous error warning.
-
20. The event monitor of claim 16, the processing logic further:
-
for each time period, determining if a total number of events counted in a current time period is less than or equal to a total number of events counted in an immediately preceding time period; and
counting a number of consecutive time periods in which the total number of events counted in the current time period is less than or equal to the total number of events counted in the immediately preceding time period.
-
-
21. The event monitor of claim 20, the processing logic further:
-
determining whether said number of consecutive time periods is greater than or equal to a preselected threshold value; and
if the number of consecutive time periods exceeds the preselected threshold value, outputting an indication of system stability.
-
-
22. The event monitor of claim 21, wherein outputting an indication of system stability comprises withdrawing a previous error warning.
-
23. The event monitor of claim 1, the processing logic further:
-
determining whether said number of consecutive time periods is greater than or equal to a preselected threshold value; and
if the number of consecutive time periods exceeds the preselected threshold value, outputting an indication of system stability.
-
-
24. The event monitor of claim 23, the processing logic further:
-
determining whether said number of consecutive time periods is greater than or equal to a preselected threshold value; and
if the number of consecutive time periods exceeds the preselected threshold value, outputting an indication of system stability.
-
-
25. The event monitor of claim 24, wherein outputting an indication of system stability comprises withdrawing a previous error warning.
-
26. The event monitor of claim 1, the processing logic further:
-
for each time period, determining if a total number of events counted in a current time period is less than or equal to a total number of events counted in an immediately preceding time period; and
counting a number of consecutive time periods in which the total number of events counted in the current time period is less than or equal to the total number of events counted in the immediately preceding time period.
-
-
27. The event monitor of claim 26, the processing logic further:
-
determining whether said number of consecutive time periods is greater than or equal to a preselected threshold value; and
if the number of consecutive time periods exceeds the preselected threshold value, outputting an indication of system stability.
-
-
28. The event monitor of claim 27, wherein outputting an indication of system stability comprises withdrawing a previous error warning.
-
29. The event monitor of claim 1, further comprising one or more additional like event monitors cascaded such that an output of each monitor is coupled to the event counter of another.
-
30. A computer-implemented method for monitoring an event stream, comprising:
-
counting the occurrence of one or more preselected events for a plurality of successive time periods; and
calculating a numerical indicium of system stability, the indicium of system stability comprising a sum of;
a first value equal to or proportional to a total number of events counted in a current time period;
a second value equal to or proportional to a total number of events counted in all time periods monitored; and
a third value equal to or proportional to the difference between a total number of events counted in a current time period and a total number of events counted in a previous time period. - View Dependent Claims (31, 32, 33, 34, 35, 36, 37, 38, 39, 40, 41, 42, 43, 44, 45, 46, 47, 48, 49)
determining whether said indicium of system stability exceeds a preselected value; and
if said indicium of system stability exceeds the preselected value, outputting an error warning.
-
-
37. The method of claim 36, wherein the step of outputting an error warning includes outputting a numerical value of said indicium.
-
38. The method of claim 36, further comprising:
if said indicium of system stability exceeds the preselected value, initiating corrective action in response to an error warning.
-
39. The method of claim 38, wherein the corrective action includes shutting down one or more components of the data communication link.
-
40. The method of claim 30, further including
calculating a rate of event occurrence for the current time period; -
determining whether the rate exceeds a preselected rate threshold value; and
if the rate exceeds the preselected rate threshold value, outputting an error warning.
-
-
41. The method of claim 40, further comprising:
-
for each time period, determining if the numerical indicium of system stability is the same as the numerical indicium of system stability calculated in an immediately preceding time period; and
counting a number of consecutive time periods in which the numerical indicium of system stability is the same as the numerical indicium of system stability calculated in an immediately preceding time period.
-
-
42. The method of claim 41, further comprising:
-
determining whether said number of consecutive time periods is greater than or equal to a preselected threshold value; and
if the number of consecutive time periods exceeds the preselected threshold value, outputting an indication of system stability.
-
-
43. The method of claim 30, further comprising:
-
for each time period, determining if the numerical indicium of system stability is the same as the numerical indicium of system stability calculated in an immediately preceding time period; and
counting a number of consecutive time periods in which the numerical indicium of system stability is the same as the numerical indicium of system stability calculated in an immediately preceding time period.
-
-
44. The method of claim 43, further comprising:
-
determining whether said number of consecutive time periods is greater than or equal to a preselected threshold value; and
if the number of consecutive time periods exceeds the preselected threshold value, outputting an indication of system stability.
-
-
45. The method of claim 44, wherein outputting an indication of system stability comprises withdrawing a previous error warning.
-
46. The method of claim 30, wherein the one or more preselected events are selected from device errors, parity errors, network protocol violations, bus protocol violations, network arbitration errors, bus arbitration errors, data corruption, traffic overload, data routing errors, and network flow control errors.
-
47. The method of claim 30, wherein the numerical indicium is an equally weighted sum of said first, second, and third values.
-
48. The method of claim 30, wherein the numerical indicium is a weighted sum of said first, second, and third values.
-
49. The method of claim 30, wherein the weighted sum is calculated using weighting factors input by a user.
-
50. An information handling system comprising:
-
a processor for executing a program of instructions on the information handling system;
a memory coupled to the processor for storing the program of instructions executable by the processor;
a data communication system coupled to the processor; and
an event monitor, wherein the program of instructions configures the information handling system to count the occurrence of one or more preselected events for a plurality of successive time periods and to calculate a numerical indicium of system stability, the indicium of system stability comprising a sum of;
a first value equal to or proportional to a total number of events counted in a current time period;
a second value equal to or proportional to a total number of events counted in all time periods monitored; and
a third value equal to or proportional to the difference between a total number of events counted in a current time period and a total number of events counted in a previous time period. - View Dependent Claims (51)
-
-
52. A network management system for controlling communications over a network, the network management system comprising an event monitor an event monitor for detecting an error condition on the network, comprising:
-
a sensor coupled to a network component for detecting the occurrence of one or more preselected events;
an event counter coupled to the sensor for counting the number of events detected by the sensor;
a timer coupled to the event counter for determining the beginning and ending a plurality of time periods to be monitored, each of the time periods having a preselected length of time; and
processing logic coupled to the timer for calculating a numerical indicium of system stability, said indicium of system stability comprising a sum of;
a first value equal to or proportional to a total number of events counted in a current time period;
a second value equal to or proportional to a total number of events counted in all time periods monitored; and
a third value equal to or proportional to the difference between a total number of events counted in a current time period and a total number of events counted in a previous time period. - View Dependent Claims (53, 54, 55)
-
-
56. A computer readable medium having contents for causing a computer-based information handling system to perform steps for detecting an error condition in a data communication system, the steps comprising:
-
counting the occurrence of one or more preselected events for a plurality of successive time periods; and
calculating a numerical indicium of system stability, the indicium of system stability comprising a sum of;
a first value equal to or proportional to a total number of events counted in a current time period;
a second value equal to or proportional to a total number of events counted in all time periods monitored; and
a third value equal to or proportional to the difference between a total number of events counted in a current time period and a total number of events counted in a previous time period.
-
Specification