Computer flight recorder with active error detection
First Claim
1. A method of detecting errors in a computer using a flight recorder resident in the computer, the flight recorder configured to log trace data for one or more instrumented software entities resident in the computer in response to specific points in executable program code for the one or more instrumented software entities being reached, the method comprising:
- executing the one or more instrumented software entities by the computer to issue work requests to a removable media system of the computer that provides an interface to one or more removable storage devices;
with the flight recorder, and in response to calls to the flight recorder from the one or more instrumented software entities, logging trace data for the one or more instrumented software entities during operational use of the computer, wherein the flight recorder is configured to log trace data associated with the work requests issued to the removable media system, wherein the logged trace data comprises a plurality of logged trace points, and wherein each logged trace point in the plurality of logged trace points is associated with a work request issued to the removable media system;
with the flight recorder, detecting a trend in the logged trace data, wherein detecting the trend includes detecting a plurality of trace points in the logged trace data associated with a first work request among the work requests issued to the removable media system;
with the flight recorder, determining an error associated with the first work request based on the detected trend in the logged trace data; and
with the flight recorder, asserting an exception and terminating the first work request in response to determining the error such that availability of the removable media system is restored without having to perform a manual reboot for the removable media system.
1 Assignment
0 Petitions
Accused Products
Abstract
A method, apparatus and program product utilize a flight recorder having active error detection functionality to proactively detect errors in a computer or a sub-system or component thereof. The active error detection may be based on one or more trends detected in the trace data logged by the flight recorder and reflective of particular types of errors that may be present in a computer during operation, such that an error may be logged, and in some instances, an exception may be triggered.
-
Citations
17 Claims
-
1. A method of detecting errors in a computer using a flight recorder resident in the computer, the flight recorder configured to log trace data for one or more instrumented software entities resident in the computer in response to specific points in executable program code for the one or more instrumented software entities being reached, the method comprising:
-
executing the one or more instrumented software entities by the computer to issue work requests to a removable media system of the computer that provides an interface to one or more removable storage devices; with the flight recorder, and in response to calls to the flight recorder from the one or more instrumented software entities, logging trace data for the one or more instrumented software entities during operational use of the computer, wherein the flight recorder is configured to log trace data associated with the work requests issued to the removable media system, wherein the logged trace data comprises a plurality of logged trace points, and wherein each logged trace point in the plurality of logged trace points is associated with a work request issued to the removable media system; with the flight recorder, detecting a trend in the logged trace data, wherein detecting the trend includes detecting a plurality of trace points in the logged trace data associated with a first work request among the work requests issued to the removable media system; with the flight recorder, determining an error associated with the first work request based on the detected trend in the logged trace data; and with the flight recorder, asserting an exception and terminating the first work request in response to determining the error such that availability of the removable media system is restored without having to perform a manual reboot for the removable media system. - View Dependent Claims (2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17)
-
Specification