Method and apparatus for ensuring proper functionality of a shared memory, multiprocessor system
DCFirst Claim
1. A processing element for use in a multiprocessor computing system, said processing element including:
- a processor block;
an input port for receiving instruction elements for execution by said processor block;
a watchdog timer in operative relationship with said processor block, said watchdog timer capable to cause said processor block to reset after a first predetermined time interval has elapsed from setting the watchdog timer in a first initial condition;
a sanity timer in operative relationship with said processor block, said sanity timer capable to cause said processor block to reset after a second predetermined time interval has elapsed from setting the sanity timer in a second initial condition;
said processor block being responsive to the presence of at least one predetermined executable instruction element to cause said watchdog timer to acquire the first initial condition;
said processing element being responsive to an external signal to cause said sanity timer to acquire the second initial condition.
9 Assignments
Litigations
0 Petitions
Accused Products
Abstract
The present invention relates to a method and apparatus for ensuring fault detection and system recovery in a multiprocessor computing system. This system comprises a multitude of processing element modules, input/output processor modules and shared memory modules. Each module within the system includes an identical period sanity timer capable to reset the module once a predetermined limit count is reached. If a global clear signal is not received from the operating system scheduler by all modules prior to the expiry of the sanity timers, a system-wide reset is effected. Each processing element module within the system further includes a watchdog timer capable to reset the module once a predetermined limit count is reached. If a process is not run by the operating system scheduler on the processing element before the expiry of the watchdog timer, effectively clearing the watchdog timer, the processing element is reset and removed from service.
-
Citations
15 Claims
-
1. A processing element for use in a multiprocessor computing system, said processing element including:
-
a processor block;
an input port for receiving instruction elements for execution by said processor block;
a watchdog timer in operative relationship with said processor block, said watchdog timer capable to cause said processor block to reset after a first predetermined time interval has elapsed from setting the watchdog timer in a first initial condition;
a sanity timer in operative relationship with said processor block, said sanity timer capable to cause said processor block to reset after a second predetermined time interval has elapsed from setting the sanity timer in a second initial condition;
said processor block being responsive to the presence of at least one predetermined executable instruction element to cause said watchdog timer to acquire the first initial condition;
said processing element being responsive to an external signal to cause said sanity timer to acquire the second initial condition. - View Dependent Claims (2, 3, 4, 5, 6)
-
-
7. A multiprocessor computing system comprising:
-
a plurality of processing elements, each processing element including;
a. a processor block;
b. an input port for receiving instruction elements for execution by said processor block;
c. a watchdog timer in operative relationship with said processor block, said watchdog timer capable to cause said processor block to reset after a first predetermined time interval has elapsed from setting the watchdog timer in a first initial condition;
d. a sanity timer in operative relationship with said processor block, said sanity timer capable to cause said processor block to reset after a second predetermined time interval has elapsed from setting the sanity timer in a second initial condition;
e. said processor block being responsive to the presence of at least one predetermined executable instruction element to cause said watchdog timer to acquire the first initial condition;
f. said processing element being responsive to an external signal to cause said sanity timer to acquire the second initial condition. - View Dependent Claims (8, 9, 10, 11, 12)
-
-
13. A computer readable storage medium including a program element for execution by a multiprocessor computer system to implement an operating system, the multiprocessor computer system including a plurality of processing elements, each processing element including a watchdog timer and a sanity timer, said operating system including:
-
a scheduler for scheduling execution of processes by the computer system, said scheduler including at least one executable instruction to cause resetting of the watchdog timer of one of the plurality of processing elements of the computer system;
a system audit process that when executed by one of the plurality of processing elements of the computer system causes generation of at least one signal to cause resetting of the sanity timer of each one of the plurality of processing elements of the computer system.
-
-
14. A processing element for use in a multiprocessor computing system, said processing element including:
-
processing means;
input means for receiving instruction elements for execution by said processing means;
a watchdog timer means in operative relationship with said processing means, said watchdog timer means capable to cause said processing means to reset after a first predetermined time interval has elapsed from setting the watchdog timer means in a first initial condition;
a sanity timer means in operatives relationship with said processing means, said sanity timer means capable to cause said processing means to reset after a second predetermined time interval has elapsed from setting the sanity timer means in a second initial condition;
said processing means being responsive to the presence of at least one predetermined executable instruction element to cause said watchdog timer means to acquire the first initial condition;
said processing element being responsive to an external signal to cause said sanity timer means to acquire the second initial condition.
-
-
15. A method for preventing a processing element of a multiprocessor computer system from being reset, said processing element comprising:
-
a processor block;
an input port for receiving instruction elements for execution by said processor block;
a watchdog timer in operative relationship with said processor block, said watchdog timer capable to cause said processor block to reset after a first predetermined time interval has elapsed from setting the watchdog timer in a first initial condition;
a sanity timer in operative relationship with said processor block, said sanity timer capable to cause said processor block to reset after a second predetermined time interval has elapsed from setting the sanity timer in a second initial condition;
said method comprising the steps of;
a. executing by said processor block at least one instruction that causes said watchdog timer to acquire the first initial condition, repeatedly at a rate selected to prevent the watchdog timer from resetting said processing element;
b. supplying an external signal to said processing element to cause said sanity timer to acquire the second initial condition, repeatedly at a rate selected to prevent the sanity timer from resetting said processing element.
-
Specification