Hardware/software based indirect time stamping methodology for proactive hardware/software event detection and control
First Claim
1. A method of analyzing events occurring on a distributed network comprising a plurality of processors, the method comprising:
- during offline processing;
updating a recent history table with critical events and any associated non-critical events that may occur in the distributed network;
computing conditional probability values into a probability table, said probability values comprising joint probability values reflecting a probability that a sequence of two or more non-critical events happen before the critical event occurs;
periodically updating the probability table;
periodically examining the probability table to determine when online analysis of the non-critical event is possible; and
generating event masks for use in a masking mechanism to filter a subset of the non-critical events associated with the critical events so that online analysis can be carried out in real-time;
during online processing;
loading the conditional probability table and the event masks computed from offline analysis;
dynamically filtering the non-critical events using the masking mechanism comprising timeout and probability thresholds;
determining that the probability of the occurrence of the critical event has surpassed a threshold level using the conditional probability table;
migrating a process away from the critical event if it is determined that a timeout period has not elapsed; and
if it is determined that the timeout period has elapsed;
reloading the conditional probability tables; and
generating new event masks to filter another subset of the non-critical events.
2 Assignments
0 Petitions
Accused Products
Abstract
An improved method and apparatus for time stamping events occurring on a large scale distributed network uses a local counter associated with each processor of the distributed network. Each counter resets at the same time globally so that all events are recorded with respect to a particular time. The counter is stopped when a critical event is detected. The events are masked or filtered in an online or offline fashion to eliminate non-critical events from triggering a collection by the system monitor or service/host processor. The masking can be done dynamically through the use of an event history logger. The central system may poll the remote processor periodically to receive the accurate counter value from the local counter and device control register. Remedial action can be taken when conditional probability calculations performed on the historical information indicate that a critical event is about to occur.
-
Citations
5 Claims
-
1. A method of analyzing events occurring on a distributed network comprising a plurality of processors, the method comprising:
-
during offline processing; updating a recent history table with critical events and any associated non-critical events that may occur in the distributed network; computing conditional probability values into a probability table, said probability values comprising joint probability values reflecting a probability that a sequence of two or more non-critical events happen before the critical event occurs; periodically updating the probability table; periodically examining the probability table to determine when online analysis of the non-critical event is possible; and generating event masks for use in a masking mechanism to filter a subset of the non-critical events associated with the critical events so that online analysis can be carried out in real-time; during online processing; loading the conditional probability table and the event masks computed from offline analysis; dynamically filtering the non-critical events using the masking mechanism comprising timeout and probability thresholds; determining that the probability of the occurrence of the critical event has surpassed a threshold level using the conditional probability table; migrating a process away from the critical event if it is determined that a timeout period has not elapsed; and if it is determined that the timeout period has elapsed; reloading the conditional probability tables; and generating new event masks to filter another subset of the non-critical events. - View Dependent Claims (2, 3, 4, 5)
-
Specification