System for recovering failure of online control program with another current online control program acting for failed online control program
First Claim
1. A method for recovering a failure of an online control program in a processing system comprising a plurality of processors, each of which has a local memory and processes an online control program and a corresponding monitor program stored in said local memory, a common memory device which is common to the processors and which is used by each online control program to store quick restart information independently, and a plurality of resources which includes disks, files on said disks, communication controllers and terminals, and which can be shared by the processors, the method comprising:
- a first step of inhibiting use of one of the plurality of resources being processed when the failure has occurred in a first online control program being processed by a first of the processors, by referring to quick restart information, arranged in said memory, by a second online control program of a second of the processors, wherein the quick restart information is accessed by a monitor program of the second processor;
a second step of executing a recovering process, using log in one of said disks, by the second online control program; and
,a third step of fetching an unprocessed service according to the quick restart information in said common memory device and starting a service employing usable resources, in parallel with the processing of said second step and by said second online control program.
1 Assignment
0 Petitions
Accused Products
Abstract
In a processing system having a plurality of CPUs, a common storage device is shared by all the CPUs, online control programs are executed by the CPUs, and monitor programs monitor the states of the online control programs and control the online control programs. When a failure of an online control program occurs, the process of the failed online control program can be taken over by another online control program. A method of recovering from the failure of an online control program is characterized by quick restart information for each online control program which is stored separately in the common storage device and separate from a log.
45 Citations
7 Claims
-
1. A method for recovering a failure of an online control program in a processing system comprising a plurality of processors, each of which has a local memory and processes an online control program and a corresponding monitor program stored in said local memory, a common memory device which is common to the processors and which is used by each online control program to store quick restart information independently, and a plurality of resources which includes disks, files on said disks, communication controllers and terminals, and which can be shared by the processors, the method comprising:
-
a first step of inhibiting use of one of the plurality of resources being processed when the failure has occurred in a first online control program being processed by a first of the processors, by referring to quick restart information, arranged in said memory, by a second online control program of a second of the processors, wherein the quick restart information is accessed by a monitor program of the second processor; a second step of executing a recovering process, using log in one of said disks, by the second online control program; and
,a third step of fetching an unprocessed service according to the quick restart information in said common memory device and starting a service employing usable resources, in parallel with the processing of said second step and by said second online control program. - View Dependent Claims (2, 3)
-
-
4. The method of recovering a failure of an online control program as defined in claim 4, further comprising steps of restarting the first online control program and returning service by said second online control program to the first online control program.
-
5. A processing system comprising:
-
a first processor which processes and stores a first online control program; a second processor which processes and stores a second online control program; a plurality of resources which are necessary for the processing of the first online control program; a first memory which stores a log; and
,a second memory which stores quick restart information for each online control program; said second processor including means for inhibiting use of a resource being processed by accessing quick restart information in said second memory when a failure of the first online control program being processed has occurred, and further including means for executing a recovering process by use of the log in said first memory and also starting an unprocessed service employing usable resources by accessing the quick restart information in said second memory.
-
-
6. A method for controlling a failure of an online control program in a system including a plurality of processors, each of said processors having means for processing and storing an online control program and a monitor program, a plurality of resources necessary for processing, a first memory for storing quick restart information for each online control program, and a second memory for storing log and database information, the method comprising steps of:
-
monitoring a first online control program by a first monitor program, the first online control program being processed by a first of the plurality of processors and utilizing first resources to provide a service to a terminal; detecting a failure of the first online control program by the first monitor program; accessing the quick restart information by a second monitor program of a second of the plurality of processors in response to the detecting; transferring the quick restart information by the second monitor program to a second online control program processed by the second processor; inhibiting use of the first resources by the second online control program according to the quick restart information; recovering databases by the second online control program according to the log; processing by the second online control program utilizing second files to provide a portion of the service to the terminal, the portion only corresponding to the second resources wherein the second resources are different from the first resources; terminating the processing by the second online control program in response to a completion of the recovering; releasing the inhibiting of the first resources; and
,restarting processing by the first online control program to provide the service to the terminal using both the first and second resources. - View Dependent Claims (7)
-
Specification