Method and apparatus for monitoring a subsystem within a distributed system for providing an archive of events within a certain time of a trap condition
First Claim
1. A method of automatically monitoring the behavior over time of a plurality of interacting subsystems of a distributed system, the method comprising the following steps carried out automatically by the interacting subsystems:
- collecting data respecting a first subsystem by using data collection means that is local to the first subsystem;
time-stamping the data;
placing the data in a local buffer in the first subsystem;
detecting occurrence of a trap condition by means of a sensor that is local to a second subsystem;
determining, locally in the second subsystem, at what time the trap condition occurs;
notifying the first subsystem of the occurrence of the trap condition and its time of occurrence; and
archiving any data that have been placed in the local buffer and that carry a time-stamp within a desired interval of the time of the occurrence of the trap condition and thereby providing a history of the first subsystem during the desired time interval.
5 Assignments
0 Petitions
Accused Products
Abstract
A method and apparatus for monitoring the behavior over time of a distributed system. Time-stamped data descriptive of events at one subsystem are placed into a local buffer. The subsystem is notified of the time when a trap condition occurs at another subsystem. Data having a time-stamp within a certain interval of the occurrence of the trap condition are archived to provide a history of the system for later analysis. Time is determined by a local clock in each subsystem. These clocks are synchronized to ensure accurate correlation between events at different subsystems. Trap conditions are categorized and data descriptive of subsystem states are classified to facilitate selective notification of one or more other subsystems, and selective retention of the data, depending on which category of trap condition has occurred and which class of data has been collected.
-
Citations
18 Claims
-
1. A method of automatically monitoring the behavior over time of a plurality of interacting subsystems of a distributed system, the method comprising the following steps carried out automatically by the interacting subsystems:
-
collecting data respecting a first subsystem by using data collection means that is local to the first subsystem; time-stamping the data; placing the data in a local buffer in the first subsystem; detecting occurrence of a trap condition by means of a sensor that is local to a second subsystem; determining, locally in the second subsystem, at what time the trap condition occurs; notifying the first subsystem of the occurrence of the trap condition and its time of occurrence; and archiving any data that have been placed in the local buffer and that carry a time-stamp within a desired interval of the time of the occurrence of the trap condition and thereby providing a history of the first subsystem during the desired time interval. - View Dependent Claims (2, 3, 4, 5, 6, 7, 8)
-
-
9. A distributed system comprising:
-
a first interacting subsystem including a first clock means, a first sensor operative to detect occurrence of a trap condition, and first logic means responsive to the clock means and the sensor to automatically determine at what time the trap condition occurs and to automatically send a signal notifying another subsystem of the occurrence of the trap condition and the time of occurrence; a second interacting subsystem including a second clock means, a buffer, a second sensor operative to automatically collect data respecting the second subsystem, and second logic means responsive to the second clock means to automatically time-stamp data collected by the second sensor and place the time-stamped data in the buffer and responsive to a signal from the first subsystem to automatically archive any of the data in the buffer that carry a time stamp within a desired interval of the time of occurrence of the trap condition and thereby provide a history of the second subsystem during the desired time interval; and communication means operative to carry signals between the first interacting subsystem and the second interacting subsystem. - View Dependent Claims (10, 11, 12, 13, 14)
-
-
15. In a distributed system of the kind having a plurality of interacting subsystems and communication means therebetween, each interacting subsystem having a local controller and a sensor for gathering data respecting that interacting subsystem, an improvement for monitoring the distributed system over time, the improvement comprising:
-
a first local clock reference and a first local buffer in communication with the local controller of a first one of the interacting subsystems; means in the local controller of the first interacting subsystem for automatically time-stamping data gathered by the sensor and placing the time-stamped data in the first local buffer; a second local clock reference in communication with the local controller of a second one of the interacting subsystems; means in the local controller of the second interacting subsystem for automatically detecting occurrence of a trap condition from data gathered by the sensor of the second interacting subsystem, determining at what time said trap condition occurs, and notifying the first interacting subsystem of the occurrence of the trap condition and the time of said occurrence; and means in the local controller of the first interacting subsystem for automatically archiving any time-stamped data in the buffer that carry a time stamp within a desired interval of the time of occurrence of the trap condition and thereby providing a history of the first subsystem during the desired time interval. - View Dependent Claims (16, 17, 18)
-
Specification