Hierarchy of fault isolation timers
First Claim
1. A method for isolating a fault condition in a complex of computing devices, the method comprising the steps of:
- establishing a hierarchy of transactions executed within said complex;
monitoring at least one transaction of said transactions with a transaction-specific timer, wherein each said transaction-specific timer has an expiration time, thereby establishing a hierarchy of timers and a hierarchy of expiration times corresponding to said hierarchy of transactions;
setting said expiration times of said transaction-specific timers to progressively higher values at progressively higher levels of said hierarchy of timers; and
upon expiration of a transaction-specific timer, transmitting a time-out condition to a next higher level timer in said hierarchy of timers, thereby indicating a fault condition within a component of said complex including said expired timer, wherein said component is a faulty component.
3 Assignments
0 Petitions
Accused Products
Abstract
In the present invention, a coordinated hierarchy of timing mechanisms preferably cooperate to report errors at different operational levels of a complex of computing devices. Preferably, each timer is able to identify a failure condition at its own level of operation and transmit a time-out condition to a higher level device which may also be a timer. Upon generation of a time-out condition, a system component experiencing a fault condition preferably continues to operate in a degraded mode, informs devices attempting to communicate with the faulty component of a status of the fault condition, and preferably proceeds to identify and correct a failure which caused the time out condition. The timers may be implemented in hardware or software.
-
Citations
20 Claims
-
1. A method for isolating a fault condition in a complex of computing devices, the method comprising the steps of:
-
establishing a hierarchy of transactions executed within said complex;
monitoring at least one transaction of said transactions with a transaction-specific timer, wherein each said transaction-specific timer has an expiration time, thereby establishing a hierarchy of timers and a hierarchy of expiration times corresponding to said hierarchy of transactions;
setting said expiration times of said transaction-specific timers to progressively higher values at progressively higher levels of said hierarchy of timers; and
upon expiration of a transaction-specific timer, transmitting a time-out condition to a next higher level timer in said hierarchy of timers, thereby indicating a fault condition within a component of said complex including said expired timer, wherein said component is a faulty component. - View Dependent Claims (2, 3, 4, 5, 6, 7, 8, 9, 10)
-
-
11. A system for preserving an operation of a distributed computing complex during a fault condition, the system comprising:
-
a plurality of devices operating within a first functional component of said complex;
a sequence of transactions executed in said first functional component linking a plurality of said plurality of devices, wherein the sequence has a first transaction and a last transaction;
a plurality of timers in communication with at least one of said plurality of devices for timing at least one of said sequence of transactions;
means for setting expiration values of said plurality of timers to progressively lower levels for timers of said plurality of timers timing transactions in increasing proximity to said last transaction, thereby establishing a hierarchy of timers and a hierarchy of expiration values;
means for transmitting a time-out condition to a next higher level timer of said plurality of timers upon expiration of a timer of said plurality of timers;
means for logging an error at said higher level timer, thereby identifying a fault condition within said first functional component of said complex;
means for informing other functional components of said complex of said fault condition in said first functional component of said complex, thereby avoiding propagation of said fault condition to said other functional components of said complex and preserving the operation of said complex during said fault condition. - View Dependent Claims (12, 13, 14, 15)
-
-
16. A computer program product having a computer readable medium having computer program logic recorded thereon for isolating a fault condition in a complex of computing devices, the computer program product comprising:
-
code for establishing a hierarchy of transactions executed within said complex;
code for monitoring at least one transaction of said transactions with a transaction-specific timer, wherein each said transaction-specific timer has an expiration time, thereby establishing a hierarchy of timers and a hierarchy of expiration times corresponding to said hierarchy of transactions;
code for setting said expiration times of said transaction-specific timers to progressively higher values at progressively higher levels of said hierarchy of timers; and
code for transmitting a time-out condition to a next higher level timer in said hierarchy of timers upon expiration of a transaction-specific timer, thereby indicating a fault condition within a component of said complex including said expired timer, wherein said component is a faulty component. - View Dependent Claims (17, 18, 19, 20)
-
Specification