Distributed data base system of composite subsystem type, and method of fault recovery for the system
First Claim
1. A failure recovery method, for an on-line system of a composite subsystem type wherein said on-line system includes a plurality of subsystems each performing processing by accessing respective distributed data bases independently and a composite subsystem controller for controlling said subsystems, said method comprising the steps, performed by said composite subsystem controller, of:
- detecting when a failure has occurred in one of said subsystem;
separating the operation of said one subsystem in which said failure has occurred from said on-line system;
defining a transaction corresponding to database of said one subsystem in which said failure has occurred; and
recovering said data base of said one subsystem in which said failure has occurred by executing said defined transaction, while continuing operation of the remainder of the system;
wherein said recovering step includes the steps, performed by said composite subsystem controller, of;
extracting, from journal information already acquired, information necessary for recovering said data base of said one subsystem in which said failure has occurred, andsaving extracted information in a saving journal file corresponding to said data base of said one subsystem in which said failure has occurred to effect recovery on the basis of the information in the saving journal file.
0 Assignments
0 Petitions
Accused Products
Abstract
In a composite subsystem having a plurality of data base systems and data communication on a plurality of processors, a composite subsystem controller unifies other data base systems of the composite subsystem and distributed data base systems, and, at the occurrence of a fault in some subsystem, allows other subsystems to operate continuously, thereby facilitating the recovery after the faulty subsystem has started up, and makes management as to which data base system a transaction in execution has accessed so that the range of failure is confined, thereby facilitating the fault recovery. The check points of two processings including updating of information in the memory and accumulation of the updated information in the journal are detected so that the need of journals earlier than the check point is eliminated, and a check point dump is acquired without waiting for the end of transaction which has been active at the check point.
107 Citations
21 Claims
-
1. A failure recovery method, for an on-line system of a composite subsystem type wherein said on-line system includes a plurality of subsystems each performing processing by accessing respective distributed data bases independently and a composite subsystem controller for controlling said subsystems, said method comprising the steps, performed by said composite subsystem controller, of:
-
detecting when a failure has occurred in one of said subsystem; separating the operation of said one subsystem in which said failure has occurred from said on-line system; defining a transaction corresponding to database of said one subsystem in which said failure has occurred; and recovering said data base of said one subsystem in which said failure has occurred by executing said defined transaction, while continuing operation of the remainder of the system; wherein said recovering step includes the steps, performed by said composite subsystem controller, of; extracting, from journal information already acquired, information necessary for recovering said data base of said one subsystem in which said failure has occurred, and saving extracted information in a saving journal file corresponding to said data base of said one subsystem in which said failure has occurred to effect recovery on the basis of the information in the saving journal file.
-
-
2. A failure recovery system, for an on-line system of a composite subsystem type wherein said on-line system includes a plurality of subsystems each performing processing by accessing respective distributed databases independently, comprising:
-
a composite subsystem controller for controlling said subsystems, said composite subsystem controller comprising; means for detecting when a failure has occurred in one of said subsystems, means for separating operation of said one subsystem in which said failure has occurred from said on-line system, means for defining a transaction corresponding to a database of said one subsystem in which said failure has occurred, and means for recovering said database of said one subsystem in which said failure has occurred by executing said defined transaction, while continuing operation of the remainder of the system; wherein said means for recovering comprises; means for extracting, from journal information already acquired, information necessary for recovering said data base of said one subsystem in which said failure has occurred, and means for saving extracted information in a saving journal file corresponding to said data base of said one subsystem in which said failure has occurred to effect recovery on the basis of the information in the saving journal file.
-
-
3. A composite system having a plurality of subsystems each being one of a data base system and a data communication system, and a common journal file for storing journals of transactions executed by said subsystems, said subsystems which are data base systems having respective data bases, comprising:
-
means for discriminating journals, in said common journal file, of transactions which have not been completed upon occurrence of an event causing said composite system to go down; means responsive to the discriminated journals for inhibiting accesses of portions of data bases related to the discriminated journals; means for rerunning the subsystems; and means for recovering the portions of the data bases to which accessed are inhibited, while continuing the operation of the subsystems. - View Dependent Claims (4, 5, 6, 7, 8)
-
-
9. A composite system having a plurality of subsystems each being one of a data base system and data communication system, and a common journal file for storing journals of transactions executed by said subsystems, said subsystems which are data base systems having respective data bases, comprising:
-
means for discriminating journals, in said common journal file, of transactions which have not been completed upon occurrence of a failure in a subsystem; means responsive to the discriminated journals for inhibiting accesses to portions of data bases related to the discriminated journals; means for detecting a failure in a subsystem; means for rendering inoperative the failed subsystem; means for recovering the failed subsystem as well as the portions of the data bases to which access is inhibited, while continuing the operation of the other subsystems; and means for returning the recovered subsystem to the system.
-
-
10. A composite system having a plurality of subsystems each being one of a data base system and a data communication system, and a common journal file for storing journals of transactions executed by said subsystems, said subsystems which are data base systems having respective data bases, comprising:
-
means for detecting a failure outside of a subsystem; a journal saving file for storing journals of transactions which have not been completed by said subsystems which are data communication systems upon occurrence of a failure outside of a subsystem; means responsive to the stored journals in said journal saving file for inhibiting accesses to portions of data bases related to the stored journals; and means for recovering the portions of the data bases to which access is inhibited, while continuing the operation of the subsystems.
-
-
11. A composite system having a plurality of subsystems each being one of a data base system and a data communication system, and a common journal file for storing journals of transactions executed by said subsystems, said subsystems which are database systems having respective data bases, and said subsystems which are data communication systems including a data output communication system and a data input communication system, comprising:
-
at least one separate distributed data processing system connected to the composite system through said output and input data communication systems; means for detecting a failure in the separate distributed data processing system or in a communication path between the separate distributed data processing system and one of said data input and output communication systems; a journal saving file for storing journals of transactions which have not been completed between said data input communication system and the separate distributed data processing system connected therewith upon occurrence of said failure in the separate distributed data processing system or in said communication path between the separate distributed data processing system and one of said data input and output communication systems; means responsive to the stored journals in said journal saving file for inhibiting accesses of portions of data bases related to the stored journals; and means for recovering the portions of the data bases to which accesses are inhibited, while continuing the operation of the subsystems. - View Dependent Claims (17, 18, 19)
-
-
12. A failure of recovery method for a composite system having a plurality of subsystems each being one of a data base system and a data communication system, and a common journal file for storing journals of transactions executed by said subsystems, and subsystems which are database systems having respective data bases, comprising the steps of:
-
discriminating journals, in said common journal file, of transactions which have not been completed upon occurrence of an event causing said composite system to go down; inhibiting, responsive to the discriminated journals, accesses of portions of data bases related to the discriminated journals; rerunning the subsystems; and recovering the portions of the data bases to which access is inhibited, while continuing the operation of the subsystems.
-
-
13. A failure recovery method for a composite system having a plurality of subsystems each being one of a database system and a data communication system, and a common journal file for storing journals of transactions executed by said subsystems, said subsystems being data base systems having respective data bases, comprising the steps of:
-
detecting a failure in a subsystem; discriminating journals, in said common journal file, of transactions which have not been completed upon occurrence of said failure in said subsystem; inhibiting, responsive to the discriminated journals, accesses of portions of data bases related to the discriminated journals; rendering inoperative the failed subsystem; recovering the failed subsystem as well as the portions of the data bases of which accesses are inhibited, while continuing the operation of the other subsystems; and returning the recovered subsystem to the system.
-
-
14. A failure recovery method for a composite system having a plurality of subsystems each being one of a database system and a data communication system, and a common journal file for storing journals of transactions executed by said subsystems, said subsystems which are data base systems having respective data bases, comprising the steps of:
-
detecting a failure outside of a subsystem; storing, in a journal saving file, journals of transactions which have not been completed by a data input communication system included in said data communication systems upon occurrence of said failure outside of said subsystem; inhibiting, responsive to the stored journals in said journal saving file, accesses of portions of data bases related to the stored journals; and recovering the portions of the data bases to which accesses are inhibited, while continuing the operation of the subsystems.
-
-
15. A failure recovery method for a composite system having a plurality of subsystems each having one of a database system and a data communication system, and a common journal file for storing journals of transactions executed by said subsystems, said subsystems which are database systems having respective data bases, and said subsystems which are data communication systems including a data output communication system and a data input communication system, comprising:
-
connecting at least one separate data processing system to the composite system through said output and input data communication systems; detecting a failure in the separate data processing system or in a communication path between the separate data processing system and one of said data input and output communication system; storing, in a journal saving file, journals of transactions which have not been completed between said data input communication system and the separate data processing system connected therewith upon occurrence of said failure in the separate data processing system or in said communication path between the separate data processing system and one of said data input and output communication systems; inhibiting, responsive to the stored journals in said journal saving file, accesses of portions of data bases related to the stored journals; and recovering the portions of the data bases to which access is inhibited, while continuing the operation of the subsystems.
-
-
16. A composite system having a plurality of data base system and a common journal file for storing journals of transactions executed by said data bases systems, comprising:
-
means for discriminating journals, in said common journal file, of transactions which have not been completed upon occurrence of an event causing said composite system to go down; means responsive to the discriminated journals for inhibiting accesses of data base systems related to the discriminated journals; and means for recovering said data base systems to which accesses are inhibited, while continuing the operation of the remainder of the data base systems.
-
-
20. A composite system having a plurality of subsystems, said subsystems each having one of a data base device and a data communication devices operating as subsystems, comprising:
-
a common journal file for storing journals of transactions executed by said subsystems; journal saving files, corresponding to said transactions, for storing a part of the journals in said common journal file; means for detecting failed transactions which have not been completed by said subsystems; means for transferring a journal corresponding to said failed transactions in a common journal file to said journal saving file; means for inhibiting accesses to subsystems related to the failed transactions; and means for recovering data base devices in said subsystems related to the failed transactions based on the journal stores in said journal saving file, while continuing the operation of the remainder of the subsystems.
-
-
21. A composite system having a plurality of subsystems each being on of a data base system and a data communication system, and a common journal file for storing journals of transactions executed by said subsystems, comprising:
-
at least one separate distributed data processing system connected to said composite system through output and input data communication systems included in said subsystems which are data communication systems; means for detecting a failure in the separate distributed data processing system or in a communication path between the separate distributed data processing system and one of said subsystems which are data communication systems; a journal saving file for storing journals of transactions which have not been completed between said subsystems which are data communication systems and the separate distributed data processing system connected therewith upon occurrence of a failure in the separate distributed data processing system or in a communication path between the separate distributed data processing system and one of said subsystems which are data communication systems; means responsive to the stored journals in said journal saving file for inhibiting accesses of a subsystem related to the stored journals; and means for recovering the subsystem to which access is inhibited, while continuing the operation of the reminder of the subsystems.
-
Specification