Job interrupt at predetermined boundary for enhanced recovery
First Claim
1. A method of restarting a computer system in the event of a failure, the computer system running jobs and having directories relating to data, a main storage area, and at least one direct access storage device, the method comprising the steps of:
- detecting the failure;
saving an image of main storage into a nonvolatile storage area in response to detection of the failure;
correcting the failure;
reloading the main storage image into said main storage after correction of the failure;
marking jobs for interruption at a predetermined system boundary; and
running jobs for a predetermined time to permit jobs to attain the predetermined system boundary such that directories are in a known state.
1 Assignment
0 Petitions
Accused Products
Abstract
A recovery mechanism restarts jobs following correction of a system failure and automatically marks the jobs for interruption at a logical boundary. The logical boundary is above logical file updating functions such that logical files are in a known state when jobs reach the boundary. When a system failure is detected which has not yet resulted in lost data, an image of working memory, including hardware status is saved on nonvolatile storage. After the failure has been resolved, the system is initially loaded with operating programs (IPL) and working memory is reloaded from the nonvolatile storage. All jobs which were reloaded are marked for interrupt at a machine instruction boundary, and processing is started. After all jobs have reached the boundary, or a predetermined time has elapsed, processing is stopped and the system is re-IPLed. There are few system index recoveries to be performed, since most jobs reached a point where logical files were synchronized with corresponding data.
111 Citations
15 Claims
-
1. A method of restarting a computer system in the event of a failure, the computer system running jobs and having directories relating to data, a main storage area, and at least one direct access storage device, the method comprising the steps of:
-
detecting the failure; saving an image of main storage into a nonvolatile storage area in response to detection of the failure; correcting the failure; reloading the main storage image into said main storage after correction of the failure; marking jobs for interruption at a predetermined system boundary; and running jobs for a predetermined time to permit jobs to attain the predetermined system boundary such that directories are in a known state.
-
-
2. A method of restarting a computer system in the event of an undesirable condition, the computer system having logical files relating to data stored on a plurality of storage devices, and tasks and jobs running on the system from a main storage, the jobs having the capability to change logical files when running below a predetermined logical boundary, the method comprising the steps of:
-
detecting the undesirable condition which has not yet caused a data loss; saving an image of main storage into a nonvolatile storage area; correcting the undesirable condition; reloading the main storage image into said main storage; marking jobs for interruption at the predetermined logical system boundary; and running jobs for a predetermined time to permit most jobs to attain the predetermined system boundary such that logical files are in a known state. - View Dependent Claims (3, 4, 5, 6, 7)
-
-
8. A computer system having data directories relating to data stored on said system, the system having a main working storage area which has a job queue from which jobs are selected for operation upon by the system, and at least one selected for operation upon by the system, and at least one nonvolatile storage device, the system being restartable following an undesirable system condition, the system comprising:
-
means for interrupting the system from operating on the jobs; means responsive to the means for interrupting the system for saving an image of said main working storage including a representation of the status of the system with respect to the job the system is presently operating upon; means coupled to said main working storage for reloading the image of said main working storage following correction of the undesirable system condition; means coupled to said main working storage for marking jobs for interruption at a predetermined system boundary, above which data directories are not normally changed; means coupled to said main working storage for restarting system operation on jobs where the jobs were interrupted using the reloaded main working storage image; and means coupled to said main working storage for monitoring jobs running on the system to determine when the jobs have reached the predetermined system boundary such that directories are in a known state.
-
-
9. A computer system having data directories relating to data stored on said system, the system having a main working storage area which has a task queue from which tasks and jobs are selected for operation upon by the system, wherein jobs are tasks capable of changing directories, the system comprising:
-
means coupled to said task queue for marking jobs in the queue for interruption at a predetermined system boundary, above which data directories are not normally changed; and means coupled to said task queue for monitoring jobs running on the system to determine when the jobs have reached the predetermined system boundary such that directories are in a known state. - View Dependent Claims (10, 11, 12, 13, 14, 15)
-
Specification