Method and apparatus for implementing fault-tolerant processing without duplicating working process
First Claim
Patent Images
1. A method for enabling recovery of an original working process upon failure including:
- obtaining state information associated with the original working process;
obtaining a copy of a code segment associated with the original working process;
loading the code segment into a portion of a memory space; and
causing the code segment to execute as an active working process, responsive to said state information;
wherein obtaining state information comprises;
obtaining a backup memory image of the original working process; and
obtaining current stack frame information of the original working process; and
wherein causing the code segment to execute comprises;
associating information in the backup memory image with the code segment;
associating said current stack frame information with the code segment; and
executing a return into the code segment in response to the current stack frame information.
1 Assignment
0 Petitions
Accused Products
Abstract
Systems and methods for implementing a memory-efficient fault tolerant computing system are provided. A generic backup process may provide fault tolerance to multiple working processes. The backup process need not include a copy of the code segments executed by the working processes, providing very large savings in memory needed to implement the fault tolerant system. Alternatively, multiple backup processes provide fault tolerance but need not include duplicated code segments for the working processes they support.
-
Citations
38 Claims
-
1. A method for enabling recovery of an original working process upon failure including:
-
obtaining state information associated with the original working process;
obtaining a copy of a code segment associated with the original working process;
loading the code segment into a portion of a memory space; and
causing the code segment to execute as an active working process, responsive to said state information;
wherein obtaining state information comprises;
obtaining a backup memory image of the original working process; and
obtaining current stack frame information of the original working process; and
wherein causing the code segment to execute comprises;
associating information in the backup memory image with the code segment;
associating said current stack frame information with the code segment; and
executing a return into the code segment in response to the current stack frame information. - View Dependent Claims (8, 9, 10, 11, 18)
-
-
2. A method for enabling recovery of an original working process upon failure including:
-
obtaining state information associated with the original working process;
obtaining a copy of a code segment associated with the original working process;
loading the code segment into a portion of a memory space; and
causing the code segment to execute as an active working process, responsive to said state information;
wherein obtaining state information comprises;
obtaining a backup memory image of the original working process; and
obtaining current stack frame information of the original working process; and
wherein causing the code segment to execute comprises;
associating information in the backup memory image with the code segment;
associating said current stack frame information with the code segment; and
restoring a program counter responsive to the state information.
-
-
3. A method for enabling recovery of an original working process upon failure including:
-
obtaining state information associated with the original working process;
obtaining a copy of a code segment associated with the original working process;
loading the code segment into a portion of a memory space; and
causing the code segment to execute as an active working process, responsive to said state information;
wherein obtaining state information comprises;
obtaining a backup memory image of the original working process; and
obtaining current stack frame information of the original working process; and
wherein causing the code segment to execute comprises;
associating information in the backup memory image with the code segment;
associating said current stack frame information with the code segment; and
restoring a program counter to a state associated with a last checkpoint prior to the failure.
-
-
4. A method for generically providing backup for a plurality of working processes comprising:
-
for each of said plurality of working processes performing;
opening a connection;
designating a memory space;
receiving a plurality of messages on the connection; and
for each of the plurality of messages that is a memory update message, updating the contents of the memory space in response to said each of the plurality of messages; and
for each of the plurality of messages that is a reconnect request indicating previous working process failure, sending information in said memory space to a source of the reconnect request. - View Dependent Claims (5, 6)
for each of the plurality of messages that is a reconnect request indicating previous working process failure, in response to the reconnect request, obtaining a copy of a code segment associated with said each of said plurality of working processes;
loading said copy;
executing into said copy responsive to information in said memory space.
-
-
6. A method as in claim 4 wherein said memory space does not include a code segment for any of said plurality of working processes.
-
7. A system for generically providing backup for a plurality of working processes comprising:
-
a processing system;
a memory storing code for operating said processing system, said code comprising;
computer code that, for each of said plurality of working processes, opens a connection;
computer code that, for each of said plurality of working processes, designates a memory space;
computer code that, for each of said plurality of working processes, receives a plurality of messages on the connection;
computer code, that for each of said plurality of working processes, for each message that is a memory update message updates the contents of the memory space in response to said each of the plurality of messages; and
computer code, that for each of said plurality of working processes, for each of the plurality of messages that is a reconnect request indicating previous working process failure, sends information in said memory space to a source of the reconnect request.
-
-
12. A method for enabling recovery of an original working process including:
-
maintaining state information associated with the original working process;
upon failure of the original working process, performing;
obtaining a copy of a code segment associated with the original working process;
loading the code segment into a portion of a memory space; and
causing the code segment to execute as an active working process, responsive to said state information; and
wherein said maintaining said state information includes;
maintaining a backup memory image associated with the original working process; and
maintaining current stack frame information associated with the original working process. - View Dependent Claims (13, 14, 15)
-
-
16. A method for enabling recovery of an original working process including:
-
maintaining state information associated with the original working process;
upon failure of the original working process, performing;
obtaining a copy of a code segment associated with the original working process;
loading the code segment into a portion of a memory space; and
causing the code segment to execute as an active working process, responsive to said state information; and
wherein said state information includes dynamic memory information and a program counter value.
-
-
17. A method for enabling recovery of an original working process including:
-
maintaining state information associated with the original working process;
upon failure of the original working process, performing;
obtaining a copy of a code segment associated with the original working process;
loading the code segment into a portion of a memory space; and
causing the code segment to execute as an active working process, responsive to said state information; and
wherein said maintaining state information includes;
receiving checkpoint information including current stack frame information from the original working process;
modifying a backup memory image associated with the original working process in response to the checkpoint information;
storing said current stack frame information.
-
-
19. A system for running fault-tolerant software including:
-
a processing system;
a memory storing code for operating said processing system, said code comprising;
a generic backup module;
an original working process in communication with the generic backup module;
a monitor module, capable of monitoring the run state of the original working process;
a nonvolatile storage device including a code segment associated with the original working process, information on said nonvolatile storage device accessible to said monitor module;
a code segment associated with the original working process stored on said nonvolatile storage device. - View Dependent Claims (20, 21, 22, 23, 24, 25, 26, 27, 28)
-
-
29. A system for running fault-tolerant software including:
-
a first processor including a generic backup module;
a second processor including a original working process, the second processor coupled to the first processor, the original working process in communication with the generic backup module;
a third processor including a monitor module, the third processor coupled to the second processor, the monitor module capable of monitoring a run state of the original working process; and
a nonvolatile storage device including a code segment associated with the original working process, information on said nonvolatile storage device accessible by said third processor. - View Dependent Claims (30, 31, 32, 33)
-
-
34. A computer program product for enabling recovery of an original working process comprising:
-
computer code that maintains state information associated with the original working process by maintaining a backup memory image associated with the original working process and maintaining current stack frame information associated with the original working process;
computer code that, upon failure of the original working process, obtains a copy of a code segment associated with the original working process;
loads the code segment into a portion of a memory space; and
causes the code segment to execute as an active working process, responsive to said state information; and
a computer readable medium that stores the computer code. - View Dependent Claims (35)
-
-
36. A computer program product for generically providing backup for a plurality of working processes, comprising:
-
computer code that, for each of said plurality of working processes, opens a connection;
computer code that, for each of said plurality of working processes, designates a memory space;
computer code that, for each of said plurality of working processes, receives a plurality of messages on the connection;
computer code, that for each of said plurality of working processes, for each message that is a memory update message updates the contents of the memory space in response to said each of the plurality of;
computer code, that for each of said plurality of working processes, for each of the plurality of messages that is a reconnect request indicating previous working process failure, sends information in said memory space to a source of the reconnect request; and
a computer readable medium that stores the computer codes. - View Dependent Claims (37)
-
-
38. A system for generically providing backup for a plurality of working processes comprising:
-
for each of said plurality of working processes, means for opening a connection;
for each of said plurality of working processes, means for designating a memory space;
for each of said plurality of working processes, means for receiving a plurality of messages on the connection;
for each of said plurality of working processes, for each message that is a memory update message, means for updating the contents of the memory space in response to said each of the plurality of messages; and
for each of said plurality of working processes, for each of the plurality of messages that is a reconnect request indicating previous working process failure, means for sending information in said memory space to a source of the reconnect request.
-
Specification