Memory management in fault tolerant computer systems utilizing a first and second recording mechanism and a reintegration mechanism
First Claim
1. A memory management system for a fault tolerant computer system, said memory management system comprising:
- a first recording mechanism which can be activated to record memory update events;
a fault input for a fault signal to activate said first recording mechanism in the event of a fault event;
a second recording mechanism having a capacity to record at least a limited number of memory update events, and to maintain a record of recent memory update events up to a number sufficient to cover the time to activate said first recording mechanism following said fault event, said second recording mechanism comprises a first-in-first-out buffer, an output of said first-in-first-out buffer is connected to said first recording mechanism, and further said first-in-first-out buffer stores up to a predetermined number of update addresses, an address decoder is connected to said output of said first-in-first-out buffer to generate a page signal representative of a memory update address output from said first-in-first-out buffer and said address decoder is responsive to said fault signal to pass said page signal to said first recording mechanism; and
a memory reintegration mechanism to reintegrate at least parts of memory identified in said first and second recording mechanisms.
2 Assignments
0 Petitions
Accused Products
Abstract
A memory management system for a fault tolerant computer system. The memory management system includes a first recording mechanism which can be activated to record memory update events; a second recording mechanism which records at least a limited number of memory update events; a fault input for a fault signal to activate the first recording mechanism in the event of a fault event; and a memory reintegration mechanism to reintegrate at least parts of memory identified in the first and second recording mechanisms. Preferably, the recording of memory updates (writes) is not based on recording each address accessed, but rather on memory segments (pages) updated (written to). Further, a fault tolerant computer system includes a plurality of synchronous processing sets, each having a processor with internal memory and operating in lockstep, and an out of sync detector for detecting an out-of-sync-event and for generating an out-of-sync signal.
-
Citations
33 Claims
-
1. A memory management system for a fault tolerant computer system, said memory management system comprising:
-
a first recording mechanism which can be activated to record memory update events; a fault input for a fault signal to activate said first recording mechanism in the event of a fault event; a second recording mechanism having a capacity to record at least a limited number of memory update events, and to maintain a record of recent memory update events up to a number sufficient to cover the time to activate said first recording mechanism following said fault event, said second recording mechanism comprises a first-in-first-out buffer, an output of said first-in-first-out buffer is connected to said first recording mechanism, and further said first-in-first-out buffer stores up to a predetermined number of update addresses, an address decoder is connected to said output of said first-in-first-out buffer to generate a page signal representative of a memory update address output from said first-in-first-out buffer and said address decoder is responsive to said fault signal to pass said page signal to said first recording mechanism; and a memory reintegration mechanism to reintegrate at least parts of memory identified in said first and second recording mechanisms. - View Dependent Claims (2, 3, 4, 5, 6, 7, 8)
-
-
9. A fault tolerant computer system comprising a plurality of synchronous processing sets, each including a processor and internal memory and operating in lockstep, and an out-of-sync detector for detecting an out-of-sync event and for generating an out-of-sync signal, wherein each processing set includes:
-
a first recording mechanism which can be activated to record memory update events; a fault input for a fault signal to activate said first recording mechanism in the event of a fault event; a second recording mechanism having a capacity to record at least a limited number of memory update events, and to maintain a record of recent memory update events up to a number sufficient to cover the time to activate said first recording mechanism following said fault event, said second recording mechanism comprises a first-in-first-out buffer, an output of said first-in-first-out buffer is connected to said first recording mechanism, and further said first-in-first-out buffer stores up to a predetermined number of update addresses, an address decoder is connected to said output of said first-in-first-out buffer to generate a page signal representative of a memory update address output from said first-in-first-out buffer and said address decoder is responsive to said fault signal to pass said page signal to said first recording mechanism; and a memory reintegration mechanism to reintegrate at least parts of memory identified in said first and second recording mechanisms.
-
-
10. A fault tolerant computer system comprising a plurality of synchronous processing sets, each comprising a processor and internal memory and operating in lockstep, and an out-of-sync detector for detecting an out-of-sync event and for generating an out-of-sync signal, wherein each processing set also comprises:
-
a first recording mechanism which can be activated to record memory write events; a fault input for receiving said out-of-sync signal to activate said first recording mechanism in the event of said out-of-sync event; a second recording mechanism having a capacity to record at least a limited number of memory write events, and to maintain a record of recent memory update events up to a number sufficient to cover the time to activate said first recording mechanism following said out-of-sync event, said second recording mechanism comprises a first-in-first-out buffer, an output of said first-in-first-out buffer is connected to said first recording mechanism, and a further said first-in-first-out buffer stores up to a predetermined number of update addresses, an address decoder is connected to said output of said first-in-first out buffer to generate a page signal representative of a memory update address output from said first-in-first-out buffer and said address decoder is responsive to said out-of-sync signal to pass said page signal to said first recording mechanism; and a memory reintegration mechanism to reintegrate in an out-of-sync processing set at least parts of memory identified in said first and second recording mechanisms. - View Dependent Claims (11, 12, 13, 14, 15, 16, 17, 18, 19)
-
-
20. A method for reintegration of a processing set of a fault tolerant computer system comprising a plurality of synchronous processing sets, each comprising a processor and internal memory and operating in lockstep, and a fault detector for detecting a fault event and for generating a fault signal, said method comprising:
-
maintaining a temporary record of memory update events over a limited period, said temporary record being stored in a first-in-first-out buffer; connecting a recording mechanism for recording an output of said first-in-first-out buffer; responding to said fault signal to activate a further record of memory update events following said fault signal; maintaining a record of recent memory update events up to a number sufficient to cover the time to activate said further record; storing up to a predetermined number of update addresses in said first-in-first-out buffer, supplying the output of said first-in-first out buffer to an address decoder to generate a page signal representative of a memory update address output from said first-in-first-out buffer, and recording said page signal as part of said further record when said fault signal is active; and performing memory reintegration in a processor in which a fault has occurred for at least those parts of memory identified in said temporary and further memory records. - View Dependent Claims (21, 22, 23, 24, 25, 26, 27, 28, 29)
-
-
30. A memory management system for a fault tolerant computer system, said memory management system comprising:
-
a first recording mechanism which can be activated to record memory update events; a fault input for a fault signal to activate said first recording mechanism in the event of a fault event; a second recording mechanism having a capacity to record at least a limited number of memory update events, and to maintain a record of recent memory update events up to a number sufficient to cover the time to activate said first recording mechanism following said fault event; and a memory reintegration mechanism to reintegrate at least parts of memory identified in said first and second recording mechanisms, wherein said memory reintegration mechanism is operative to reintegrate memory pages identified in said first and second recording mechanisms.
-
-
31. A fault tolerant computer system having a plurality of synchronous processing sets, each including a processor and internal memory and operating in lockstep, and an out-of-sync detector for detecting an out-of-sync event and for generating an out-of-sync signal, wherein each processing set includes a memory management system, said memory management system comprising:
-
a first recording mechanism which can be activated to record memory update events; a fault input for a fault signal to activate said first recording mechanism in the event of a fault event; a second recording mechanism having a capacity to record at least a limited number of memory update events, and to maintain a record of recent memory update events up to a number sufficient to cover the time to activate said first recording mechanism following said fault event; and a memory reintegration mechanism to reintegrate at least parts of memory identified in said first and second recording mechanisms, wherein said memory reintegration mechanism is operative to reintegrate memory pages identified in said first and second recording mechanisms.
-
-
32. A fault tolerant computer system comprising a plurality of synchronous processing sets, each comprising a processor and internal memory and operating in lockstep, and an out-of-sync detector for detecting an out-of-sync event and for generating an out-of-sync signal, wherein each processing set also comprises:
-
a first recording mechanism which can be activated to record memory write events; a fault input for receiving said out-of-sync signal to activate said first recording mechanism in the event of said out-of-sync event; a second recording mechanism having a capacity to record at least a limited number of memory write events, and to maintain a record of recent memory update events up to a number sufficient to cover the time to activate said first recording mechanism following said out-of-sync event, and a memory reintegration mechanism to reintegrate in an out-of-sync processing set at least parts of memory identified in said first and second recording mechanisms, wherein said memory reintegration mechanism is operative to reintegrate memory pages identified in said first and second recording mechanisms.
-
-
33. A method for reintegration of a processing set of a fault tolerant computer system including a plurality of synchronous processing sets, each having a processor and internal memory and operating in lockstep, and a fault detector for detecting a fault event and for generating a fault signal, said method comprising:
-
maintaining a temporary record of memory update events over a limited period; responding to said fault signal to activate a further record of memory update events following said fault state; maintaining a record of recent memory update events up to a number sufficient to cover the time to activate said further record; performing memory reintegration in a processor in which a fault has occurred for at least those parts of memory identified in said temporary and further memory records.
-
Specification