System and method for fault detection and recovery
First Claim
Patent Images
1. A method for automatically detecting and recovering from a fault in a microprocessor-based system, comprising:
- reporting the fault as an event;
processing the event including thresholding the event and co-relating the event to a cause;
determining a recovery action as a function of the thresholding, the co-relating, and an elapsed time the system has been running; and
performing the recovery action.
0 Assignments
0 Petitions
Accused Products
Abstract
An apparatus and method for automatically detecting and recovering from a fault in a microprocessor-based system. The apparatus and method utilizes a leaky bucket routine and an event handler procedure. The method may further use Object Oriented techniques that abstracts differences between hardware and software faults to allow for the development of a common framework.
-
Citations
33 Claims
-
1. A method for automatically detecting and recovering from a fault in a microprocessor-based system, comprising:
-
reporting the fault as an event;
processing the event including thresholding the event and co-relating the event to a cause;
determining a recovery action as a function of the thresholding, the co-relating, and an elapsed time the system has been running; and
performing the recovery action. - View Dependent Claims (2, 3, 4, 5, 6, 7, 8, 9, 10)
-
-
11. A method for automatically detecting and recovering from a fault in a microprocessor-based system defined as a hierarchical set of objects to model software, hardware, and external entities the system communicates with, the method comprising:
-
reporting the fault as an event;
processing the event including thresholding the event and co-relating the event to a cause, the thresholding using a leaky bucket algorithm and error counting;
determining a recovery action as a function of the thresholding, the co-relating, an elapsed time the system has been running, relationships imposed by the hierarchical set of objects, event histories, and system state information, wherein root cause correlation of events occurs across all domains and hierarchies of the system to determine recovery actions that provide for specific recoveries and system escalations; and
performing the recovery action. - View Dependent Claims (12)
-
-
13. A method of automatically detecting and recovering from a fault in a microprocessor-based system, comprising:
-
reporting the fault as an event processing the event including thresholding the event and co-relating the event to a cause selected from a list of potential causes, the list of potential causes including potential hardware causes and potential software causes;
determining a recovery action as a function of the thresholding and the co-relating; and
performing the recovery action. - View Dependent Claims (14, 15, 16, 17)
-
-
18. A method of automatically detecting and recovering from a fault in a microprocessor-based system, comprising:
-
reporting the fault as an event processing the event including thresholding the event and co-relating the event to a cause selected from a list of potential causes, the list of potential causes including potential causes internal to the microprocessor-based system and potential causes external to the microprocessor-based system; and
determining a recovery action as a function of the thresholding and the co-relating. - View Dependent Claims (19, 20, 21, 22)
-
-
23. A method of automatically detecting and recovering from a fault in a microprocessor-based system, comprising:
-
creating a list of object-cause pairs relevant to a fault event;
incrementing a count in a leaky bucket associated with each object-cause pair; and
performing an action associated with a threshold if the count of the leaky bucket exceeds the threshold, the action being selected from a list of potential actions, the list of potential actions including hardware actions and software actions. - View Dependent Claims (24, 25, 26)
-
-
27. A method of automatically detecting and recovering from a fault in a microprocessor-based system, comprising:
-
creating a list of object-cause pairs relevant to a fault event;
incrementing a count in a leaky bucket associated with each object-cause pair; and
performing an action associated with a threshold if the count of the leaky bucket exceeds the threshold, the action being selected from a list of potential actions, the list of potential actions including actions internal to the microprocessor-based system and actions external to the microprocessor-based system. - View Dependent Claims (28, 29, 30)
-
-
31. A method for recovering from a fault in a microprocessor-based system, the method comprising:
-
reporting the fault as an event;
determining an elapsed time the microprocessor-based system has been running;
determining a recovery action as a function of the elapsed time the microprocessor-based system has been running; and
performing the recovery action. - View Dependent Claims (32, 33)
-
Specification