Alternate processor continuation of task of failed processor
First Claim
1. A method in a multi-processor computer system of continuing the execution of a program or program task which is terminated before completion when is executing on a processor which fails due to a hard error condition, comprising the steps of:
- copying contents of registers in a failing processor into storage to store a predetermined program continuation interruption state when the processor detects a hard error condition;
sending a signal identifying the failing processor to an other processor which is operational;
checking by the other processor of validity of contents stored by the copying step; and
signalling healthy processor(s) in the system of a request for a healthy processor to continue execution of the program or program task if the checking step finds validity;
selecting of a healthy processor to continue execution of the program or program task by signalling remaining healthy processor(s) of a selection, andloading into a selected processor from storage the stored program continuation interruption state of the failing processor to continue execution of the program or program task from a last successfully executed instruction without having any abnormal end indicated for the program or program task.
1 Assignment
0 Petitions
Accused Products
Abstract
Completes on a another CPU the execution of a program, or program task, terminated by a processor error on a first CPU without re-executing any successfully-completed instructions and without any abnormal ending being provided to the program. The continued program need not have any built-in recovery or correction code. Predetermined register contents in the failed processor are stored in predetermined storage locations by the the failing processor or by a service processor (SP) when the failing processor has not been able to store this information. The predetermined contents saved from the failed processor are defined by the system architecture for saving an interruption of a program to enable the continuation of execution of the program after restoring the contents of PSWs, CRs, FPRs, GPRs, ARs, etc. if using the ESA/370 architecture. When a failed processor is detected, the SP issues an external interruption to other processors in the system that are operable for continuing the execution of the failed processor task after the required information is stored. Special indicators are stored in predetermined places in the system and/or microcode memory that is accessible to the SP and to the healthy processors in the system selectable for continuing the task'"'"' s execution.
66 Citations
29 Claims
-
1. A method in a multi-processor computer system of continuing the execution of a program or program task which is terminated before completion when is executing on a processor which fails due to a hard error condition, comprising the steps of:
-
copying contents of registers in a failing processor into storage to store a predetermined program continuation interruption state when the processor detects a hard error condition; sending a signal identifying the failing processor to an other processor which is operational; checking by the other processor of validity of contents stored by the copying step; and
signalling healthy processor(s) in the system of a request for a healthy processor to continue execution of the program or program task if the checking step finds validity;selecting of a healthy processor to continue execution of the program or program task by signalling remaining healthy processor(s) of a selection, and loading into a selected processor from storage the stored program continuation interruption state of the failing processor to continue execution of the program or program task from a last successfully executed instruction without having any abnormal end indicated for the program or program task. - View Dependent Claims (2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28, 29)
-
Specification