Method for improving recovery performance from hardware and software errors in a fault-tolerant computer system
First Claim
1. A method for rapid recovery from a network file server failure, operating on a computer configuration comprising:
- a plurality of computer systems adapted for responding to file server requests, each of said plurality of computer systems comprising;
a computer executing a file server operating system, and a mass storage device connected to said computer;
an additional computer system, comprising;
an additional computer which can execute said file server operating system, andan additional mass storage system comprising at least one mass storage device, connected to said additional computer;
and means for communicating between each of said plurality of computer system and said additional computer system,the recovery method comprising;
running a mass storage access program on said additional computer, said mass storage access program receiving mirroring data from each computer of said plurality of computer systems over said communicating means and writing said mirroring data to said additional mass storage system;
and for each computer system in said plurality of computer systems;
installing a mass storage emulator on said computer system for use by said file server operating system, said mass storage emulator taking mass storage write requests from said file server operating system and sending mirroring data indicative of said write request to said additional computer system over said communicating means;
initiating mirroring of data by writing said data both to said mass storage device of said computer system and through said mass storage emulator and said mass storage access program to said additional mass storage system, where said mass storage access program and said mass storage emulator makes a portion of said additional mass storage system appear as if said portion of said additional mass storage device were an extra mass storage device connected to said computer of said computer system in the same manner as said mass storage device is connected to said computer of said computer system;
and then when a failure of any of said plurality of computer systems is detected, performing at least the following steps;
transferring responsibility for responding to file server requests previously responded to by said failed computer system to said additional computer system; and
continuing to mirror data from said plurality of computer systems that have not failed to said additional computer system so that said additional computer system both responds to file server requests and mirrors data.
4 Assignments
0 Petitions
Accused Products
Abstract
A method for providing rapid recovery from a network file server failure through the use of a backup computer system. The backup computer system runs a special mass storage access program that communicates with a mass storage emulator program on the network file server, making the disks (or other mass storage devices) on the backup computer system appear like they were disks on the file server computer. By mirroring data by writing to both the mass storage of the file server and through the mass storage emulator and mass storage access program to the disks on the backup computer, a copy of the data on the file server computer is made. Optionally, selected portions of the data read through the mass storage emulator program can be altered before being returned as the result of the read operation on the file server. In the event of failure of the file server computer, the backup computer can replace the file server, using the copy of the file server'"'"'s data stored on its disks. A single backup computer can support a plurality of file server computers. Unlike other redundant file server configurations, this method does not require the backup computer system to be running the file server operating system.
-
Citations
26 Claims
-
1. A method for rapid recovery from a network file server failure, operating on a computer configuration comprising:
-
a plurality of computer systems adapted for responding to file server requests, each of said plurality of computer systems comprising; a computer executing a file server operating system, and a mass storage device connected to said computer; an additional computer system, comprising; an additional computer which can execute said file server operating system, and an additional mass storage system comprising at least one mass storage device, connected to said additional computer; and means for communicating between each of said plurality of computer system and said additional computer system, the recovery method comprising; running a mass storage access program on said additional computer, said mass storage access program receiving mirroring data from each computer of said plurality of computer systems over said communicating means and writing said mirroring data to said additional mass storage system; and for each computer system in said plurality of computer systems; installing a mass storage emulator on said computer system for use by said file server operating system, said mass storage emulator taking mass storage write requests from said file server operating system and sending mirroring data indicative of said write request to said additional computer system over said communicating means; initiating mirroring of data by writing said data both to said mass storage device of said computer system and through said mass storage emulator and said mass storage access program to said additional mass storage system, where said mass storage access program and said mass storage emulator makes a portion of said additional mass storage system appear as if said portion of said additional mass storage device were an extra mass storage device connected to said computer of said computer system in the same manner as said mass storage device is connected to said computer of said computer system; and then when a failure of any of said plurality of computer systems is detected, performing at least the following steps; transferring responsibility for responding to file server requests previously responded to by said failed computer system to said additional computer system; and continuing to mirror data from said plurality of computer systems that have not failed to said additional computer system so that said additional computer system both responds to file server requests and mirrors data.
-
-
2. A method for recovery from failure of a network server in a network configuration comprising a plurality of network servers and a backup network server, said plurality of network servers being interconnected to said backup network server by means for communicating between said plurality of network servers and said backup network server, each of said plurality of network servers and said backup network server being adapted to execute a file serving operating system in order to process file server requests and each of said plurality of network servers and said backup network server comprising at least one attached mass storage device, said method comprising the steps of:
-
initiating operation of said file serving operating system on said plurality of network servers so that each of said plurality of network servers becomes operative to process file server requests; selecting at least one of a first and second system configuration, wherein the first system configuration permits said backup network server to operate solely as a backup to said plurality of network servers, whereby file server requests are not processed by said backup network server until a failure of one of said plurality of network servers occurs, and wherein said second system configuration permits said backup network server to process different file server requests from any of said plurality of network servers both prior to and after occurrence of a failure of one of said network servers; and initiating data mirroring from said each of said plurality of network servers to said backup network server via said means for communicating so that when a data write request is received by any of said plurality of network servers, data contained in said write request is written both to the at least one mass storage device attached to the network server which received the write request and to the at least one mass storage device attached to said backup network server. - View Dependent Claims (3, 4, 5, 6, 7, 8, 9, 10)
-
-
11. A method for recovery from a file server failure in a network configuration comprising a plurality of network servers and a backup network server interconnected by means for communicating between said plurality of network servers and said backup network server, each of said plurality of network servers and said backup network server comprising a file serving operating system and at least one mass storage device connected thereto, said method comprising the steps of:
-
selecting a network configuration wherein said plurality of network servers and said backup network server become operative to process different file server requests so that a given file server request is processed either by said backup network server or by at least one of said plurality of network servers; and initiating data mirroring from said plurality of network servers to said backup network server via said means for communicating so that when a data write request is received by one of said plurality of network servers, data contained in said write request is written both to the at least one mass storage device of said one of said plurality of network servers and to the at least one mass storage device of said backup network server by performing at least the following steps; executing on each of said plurality of network servers a mass storage emulator program means for emulating a mass storage device so that each of said plurality of network servers appears to have at least one extra mass storage device in addition to the at least one mass storage device connected to each of said plurality of network servers; and executing on said backup network server at least one mass storage access program means for communicating with said mass storage emulation program on each of said plurality of network servers over said means for communicating and for writing data received over said means for communicating to the at least one mass storage device connected to said backup network server so that when data is written to said mass storage emulation program on any of said plurality of network servers, the data is transferred from said mass storage emulation program on said plurality of network servers to said backup network server and written to the at least one mass storage device connected thereto. - View Dependent Claims (12, 13, 14, 15, 16)
-
-
17. A method for recovery from failure of a network, server in a network configuration comprising a plurality of network servers and a backup network server, said plurality of network servers being interconnected to said backup network server by means for communicating between said plurality of network servers and said backup network server, each of said plurality of network servers and said backup network server being adapted to execute a file serving operating system in order to process file server requests and each of said plurality of network servers and said backup network server comprising at least one attached mass storage device, said method comprising the steps of:
-
initiating operation of said file serving operating system on said plurality of network servers so that each of said plurality of network servers becomes operative to process file server requests; initiating data mirroring from said each of said plurality of network servers to said backup network server via said means for communicating so that when a data write request is received by any of said plurality of network servers, data contained in said write request is written both to the at least one mass storage device attached to the network server which received the write request and to the at least one mass storage device attached to said backup network server by performing at least the following steps; executing on each of said plurality of network servers a mass storage emulator program means for emulating a mass storage device so that each of said plurality of network servers appears to have at least one extra mass storage device in addition to the at least one mass storage device connected to each of said plurality of network servers; and executing on said backup network server at least one mass storage access program means for communicating with said mass storage emulation program on each of said plurality of network servers over said means for communicating and for writing data received over said means for communicating to the at least one mass storage device connected to said backup network server so that when data is written to said mass storage emulation program on any of said plurality of network servers, the data is transferred from said mass storage emulation program on said plurality of network servers to said backup network server and written to the at least one mass storage device connected thereto; detecting failure of one of said plurality of network servers; transferring responsibility for processing file server requests previously processed by said failed one of said plurality of network servers to said backup network server; and continuing data mirroring from the non-failed of said plurality of network servers to said backup network server.
-
-
18. A system for recovery from failure of a network server comprising:
-
a plurality of network servers, each comprising at least one attached mass storage device and a file server operating system, each of said plurality of network servers being adapted to process file server requests from clients; a backup network server comprising at least one attached mass storage device; means for communicating between at least one of said plurality of network servers and said backup network server so that data can be exchanged over said means for communicating; wherein said at least one of said plurality of network servers comprises means for processing file server write requests in such a way that data in file server write requests is written both to said at least one mass storage device attached to said at least one of said plurality of network servers and to said backup network server via said means for communicating; wherein said backup network server comprises means for writing data received over said means for communicating from said at least one of said plurality of network servers to the at least one mass storage device attached to said backup network server in order to mirror said data; wherein said at least one of said plurality of network servers and said backup network server comprise means for executing a different sequence of instructions when a file server write request is processed by said at least one of said plurality of network servers so that if said at least one of said plurality of network servers encounters a software error that causes a failure, said backup network server will not encounter the same software error; and wherein said at least one of said plurality of network servers further comprises a mass storage emulation program so that said at least one of said plurality of network servers appears to have another mass storage device attached thereto in addition to said at least one attached mass storage device, and wherein said backup network server comprises a mass storage access program that is adapted to communicate with said mass storage emulation program via said means for communicating and that is adapted to write data received over said means for communicating to the at least one mass storage device attached to said backup network server. - View Dependent Claims (19, 20)
-
-
21. A system for recovery from failure of a network server comprising:
-
a plurality of network servers, each comprising at least one attached mass storage device and a file server operating system, each of said plurality of network servers being adapted to process file server requests from clients; a backup network server comprising at least one attached mass storage device; means for communicating between at least one of said plurality of network servers and said backup network server so that data can be exchanged over said means for communicating; wherein said at least one of said plurality of network servers is adapted to process file server write requests in such a way that data in file server write requests is written both to said at least one mass storage device attached to said at least one of said plurality of network servers and to said backup network server via said means for communicating; wherein said backup network server is adapted to write data received over said means for communicating from said at least one of said plurality of network servers to the at least one mass storage device attached to said backup network server in order to mirror said data, and wherein said backup network is further adapted to execute different instructions when a file server request is processed; wherein said at least one of said plurality of network servers and said backup network server execute a different sequence of instructions when a file server write request is processed by said at least one of said plurality of network servers so that if said at least one of said plurality of network servers encounters a software error that causes a failure, said backup network server will not encounter the same software error; and wherein said at least one of said plurality of network servers further comprises a mass storage emulation program so that said at least one of said plurality of network servers appears to have another mass storage device attached thereto in addition to said at least one attached mass storage device, and wherein said backup network server comprises a mass storage access program that is adapted to communicate with said mass storage emulation program via said means for communicating and that is adapted to write data received over said means for communicating to the at least one mass storage device attached to said backup network server. - View Dependent Claims (22, 23, 24, 25)
-
-
26. As an article of manufacture, a computer program product comprising a computer-readable medium having program code means encoded thereon for use by a plurality of network servers and a backup network server connected together in a network configuration wherein a means for communicating is connected between each said network server and said backup network server, and each of said plurality of network servers and said backup network server comprising an attached mass storage device, said computer program code means comprising:
-
a mass storage emulation program means for use by said plurality of network servers, said mass storage emulation program means emulating a mass storage device and providing communication with said backup network server over said means for communicating in order to read data from or write data to the mass storage device attached to said backup network server; a mass storage access program means for use by said backup network server, said mass storage access program means storing data on, and reading data from, said mass storage device attached to said backup network server and receiving data from a plurality of said mass storage emulation program means loaded on said plurality of said network servers, and said mass storage access program means writing said received data to the mass storage device attached to said backup network server; and wherein said mass storage emulation program means and said mass storage access program means are together operable in either of a first and second system configuration wherein the first system configuration permits said backup network server to operate solely as a backup to said plurality of network servers, whereby file server requests are not processed by said backup network server until a failure of one of said plurality of network servers occurs, and wherein said second system configuration permits said backup network server to process different file server requests from any of said plurality of network servers both prior to and after occurrence of a failure of one of said network servers.
-
Specification