Apparatus and methods for controlling restart conditions of a faulted process
First Claim
1. A method of handling processing faults in a computer system, the method comprising the steps of:
- detecting improper execution of a set of instructions;
initiating execution of the set of instructions in response to the step of detecting;
repeating the steps of detecting and initiating in a first timing pattern according to a first restart sequence; and
repeating the steps of detecting and initiating according to a second restart sequence, wherein the second restart sequence initiates execution of the set of instructions in a second timing pattern, the second timing pattern being different than the first timing pattern of the first restart sequence.
1 Assignment
0 Petitions
Accused Products
Abstract
A system including a method and apparatus are provided for controlling fault conditions in a computer controlled device such as a data communications device. The invention can preferably be provided in a process restarter mechanism within an operation system. In operation, the process restarter system detects improper execution (i.e., detects a processing failure) of a set of instructions and initiates execution of the set of instructions in response to the operation of detecting. The system then repeats the operation of detecting and initiating according to a first restart sequence and then repeats the operation of detecting and initiating according to a second restart sequence. The second restart sequence initiates execution of the set of instructions in a different sequence than the first restart sequence. For example, the first restart sequence may perform process restarts quickly after failure detection, while the second restart sequence performs process restarts after longer and longer periods of time after failure detection. The quick restarts of the first restart sequence initially provide for maximum process uptime, and the delayed or progressively backed-off restarts allows a fault condition causing process failure to be remedied. The second restart sequence can include the use of helper processes which provide passive or active fault management. In active fault management using helper processes, the helper processes can diagnose and correct the fault condition(s) causing the improper execution of the set of instructions. Passive fault management helper processes can diagnose the fault condition and report back to the process restarter. By providing delayed restarts in the second restart sequence along with helper processes, fault management in a device equipped with the invention helps ensure proper and prolonged device operation with minimized system resource over-utilization.
-
Citations
35 Claims
-
1. A method of handling processing faults in a computer system, the method comprising the steps of:
-
detecting improper execution of a set of instructions;
initiating execution of the set of instructions in response to the step of detecting;
repeating the steps of detecting and initiating in a first timing pattern according to a first restart sequence; and
repeating the steps of detecting and initiating according to a second restart sequence, wherein the second restart sequence initiates execution of the set of instructions in a second timing pattern, the second timing pattern being different than the first timing pattern of the first restart sequence. - View Dependent Claims (2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21)
detecting a fault condition associated with the set of instructions; and
determining if the fault condition exceeds a maximum number of fault conditions associated with the first restart sequence, and if not, performing the step of initiating execution of the set of instructions, such that upon each successive step of detecting and initiating according to the first restart sequence, execution of the set of instructions is initiated without a lag time.
-
-
4. The method of claim 1 wherein the second restart sequence performs each step of initiating in response to the step of detecting after expiration of a current time interval that is different than a former time interval of a former repetition of the second restart sequence.
-
5. The method of claim 4 wherein the current time interval is greater than the former time interval.
-
6. The method of claim 1 wherein each repetition of the second restart sequence initiates execution of the set of instructions after waiting progressively longer time intervals in response to the step of detecting.
-
7. The method of claim 1 wherein the second restart sequence includes the steps of:
-
determining a runtime for the set of instructions;
determining a next restart interval based on the runtime for the set of instructions;
performing the step of initiating execution of the set of instructions after expiration of the next restart interval; and
wherein upon each repetition of the second restart sequence, the next restart interval is different.
-
-
8. The method of claim 7 wherein the next restart interval determined in each successive repetition of the second restart sequence is progressively longer in duration than a next restart interval determined in a former repetition of the second restart sequence.
-
9. The method of claim 7 wherein the step of determining a next restart interval uses the runtime for the set of instructions to select a next restart interval from a set of next restart intervals associated with the second restart sequence.
-
10. The method of claim 7 wherein the step of determining a next restart interval determines if the runtime for the set of instructions is less than a current restart interval, and if so, advances the next restart interval based on the current restart interval.
-
11. The method of claim 7 further including the step of determining if the runtime for the set of instructions exceeded a current restart interval, and if so, initiating execution of the set of instructions and terminating the second restart sequence.
-
12. The method of claim 1, wherein a restart interval determining a time between the steps of detecting and initiating in at least one of the first and second restart sequences is programmable.
-
13. The method of claim 1 wherein a restart interval determining a time between the steps of detecting and initiating in at least one of the first and second restart sequences is computed based on a formula based on at least one of a geometric, an exponential, a logarithmic, an incremental, a progressive, a linear, an increasing, a decreasing and a random pattern.
-
14. The method of claim 1 wherein the step of initiating in the second restart sequence is performed at an elapsed time interval measured from a former step of detecting.
-
15. The method of claim 1 wherein the step of initiating in the second restart sequence is performed at an elapsed time interval measured from a former step of initiating.
-
16. The method of claim 1 wherein the step of detecting improper execution of a set of instructions detects a fault due to a resource required by the set of instructions.
-
17. The method of claim 1 wherein the step of detecting improper execution of a set of instructions detects a fault due to a hung process required by the set of instructions.
-
18. The method of claim 1 further including the steps of:
initiating execution of a set of helper instructions in response to the step of detecting improper execution of a set of instructions, the set of helper instructions performing functions to diagnose and handle processing faults in the computer system causing the improper execution of the set of instructions.
-
19. The method of claim 18 wherein the set of helper instructions executed is selected from a plurality of sets of helper instructions in which each set is designed to identify a specific fault condition related to the improper execution of the set of instructions.
-
20. The method of claim 7 wherein the second restart sequence further includes the steps of:
initiating execution of a set of helper instructions in response to the step of detecting improper execution of a set of instructions during the second restart sequence, the set of helper instructions performing functions to assist in the handling of processing faults in the computer system.
-
21. The method of claim 20 wherein the step of initiating execution of the set of helper instructions selects the set of helper instructions to be executed based upon the next restart interval.
-
22. A method for handling faults in a computer system, the method comprising the steps of:
-
detecting a fault condition which causes improper execution of a set of instructions;
determining a period of time to wait in response to detecting the fault condition;
waiting the period of time in an attempt to allow the fault condition to be minimized;
initiating execution of the set of instructions after waiting the period of time;
repeating the steps of detecting, determining, waiting and initiating, wherein each repeated step of determining a period of time determines a time period based on a formula based on at least one of a geometric, an exponential, a logarithmic, an incremental, a progressive, a linear, an increasing, a decreasing and a random pattern. - View Dependent Claims (23)
initiating execution of a set of helper instructions to diagnose and correct the fault condition, the step of initiating execution of a set of helper instructions being performed concurrently with the step of waiting the period of time.
-
-
24. A method for fault management in a computer controlled device, the method comprising the steps of:
-
detecting a fault condition associated with a process;
determining a runtime for the process;
determining a restart interval based on the runtime for the process;
executing a helper process associated with the restart interval to diagnose and remedy the fault condition associated with the process, the helper process executing within the restart interval;
initiating execution of the process after expiration of the restart interval.
-
-
25. A computer controlled device comprising:
-
a processor;
an input mechanism;
an output mechanism;
a memory/storage mechanism;
an interconnection mechanism coupling the processor, the input mechanism, the output mechanism, and the memory/storage mechanism;
the memory/storage mechanism maintaining a process restarter that executes in conjunction with the processor, the process restarter detecting improper execution of a set of instructions on the processor and initiating execution of the set of instructions in response to detecting improper execution, the process restarter repeatedly performing the detecting and initiating operations in a first timing pattern according to a first restart sequence, and the process restarter repeatedly performing the detecting and initiating operations according to a second restart sequence, wherein the second restart sequence causes the process restarter to initiate execution of the set of instructions in a second timing pattern, the second timing pattern being different than the first timing pattern of the first restart sequence. - View Dependent Claims (26, 27, 28)
the process restarter, in the second restart sequence, performs the operation of detecting, and then waits for expiration of a restart interval before performing the operation of initiating; and
wherein each restart interval between successive repetitions of the second restart sequence becomes progressively longer in duration.
-
-
27. The computer controlled device of claim 25 wherein each restart interval is computed based on a formula based on at least one of a geometric, an exponential, a logarithmic, an incremental, a progressive, a linear, an increasing, a decreasing and a random pattern.
-
28. The computer controlled device of claim 25 further including a helper process that resides in the memory/storage mechanism and executes in conjunction with the processor, the helper process executing during the expiration period of the restart interval during the second restart sequence in order to diagnose and correct at least one fault condition causing the improper execution of the set of instructions.
-
29. A computer controlled device comprising:
-
a processor;
an input mechanism;
an output mechanism;
a memory/storage mechanism;
an interconnection mechanism coupling the processor, the input mechanism, the output mechanism, and the memory/storage mechanism;
means for detecting a fault condition associated with a process stored in the memory/storage mechanism;
means for determining a runtime for the process;
means for determining a restart interval based on the runtime for the process;
means for executing a helper process associated with the restart interval to diagnose and remedy the fault condition associated with the process, the helper process executing within the restart interval;
means for initiating execution of the process after expiration of the restart interval; and
means for repeating operation of the means for detecting, means for determining, means for executing, and means for initiating at restart intervals that differ from one repetition to a next repetition. - View Dependent Claims (30)
-
-
31. A computer program product having a computer-readable medium including computer program logic encoded thereon for controlling faults in a computer controlled device, such that the computer program logic, when executed on at least one processing unit within the computer controlled device, causes the at least one processing unit to perform the steps of:
-
detecting improper execution of a set of instructions;
initiating execution of the set of instructions in response to the step of detecting;
repeating the steps of detecting and initiating in a timing pattern according to a first restart sequence; and
repeating the steps of detecting and initiating according to a second restart sequence, wherein the second restart sequence initiates execution of the set of instructions in a second timing pattern, the second timing pattern being different than the first timing pattern of the first restart sequence. - View Dependent Claims (32, 33)
determining a runtime for the set of instructions;
determining a next restart interval based on the runtime for the set of instructions;
performing the step of initiating execution of the set of instructions after expiration of the next restart interval; and
wherein upon each repetition of the second restart sequence, the next restart interval is different.
-
-
33. The computer program product of claim 31 wherein the computer program logic is embedded within an operating system for a computer controlled device.
-
34. A process control block data structure maintained in a computer readable medium, the process control block data structure maintaining information about an instantiation of a process and information about at least two restart sequences used to reinitiate the process in the event of a failure of the process, the at least two restart sequences having different respective timing patterns.
-
35. A propagated data signal representing a process control block data structure, the process control block data structure maintaining information about an instantiation of a process and information about at least two restart sequences used to reinitiate the process in the event of a failure of the process, the at least two restart sequences having different respective timing patterns.
Specification