Memory back-up system
First Claim
1. Memory backup apparatus for a fault-tolerant computer system having system memories for storing data, a processing element for performing a plurality of data processing tasks and computations, said processing element generating write control signals to cause information to be written into said system memories and task control signals for controlling said system during a context switch, and means for monitoring fault occurrences in said computer system and for generating fault signals, said backup apparatus comprising,a first memory area and a second, physically separate memory area located in said system memories, said memory areas storing duplicate copies of data and subsequent computational results generated by said processing element;
- a temporary storage area generating a full signal when a predetermined number of storage locations in said memory having been written to;
first means responsive to said write signals for writing data from said first memory area into said temporary storage area and for writing computational results produced by said processing element into said temporary storage area;
second means responsive to said full signal and to said task control signals for writing selected portions of said temporary storage area into said first memory area when said temporary storage area is full or when said task control signals indicate a context switch is requested, said second means producing a completion signal when said writing has been completed;
third means responsive to said completion signal and to said fault signals for writing said selected portions of said temporary storage area into said second memory area when the writing of data from said temporary storage area into said first memory area has been completed without a fault condition being detected;
a backup status register and means connected to said second and said third writing means for updating backup status information stored in said backup status register, said backup status information identifying memory areas which have been written from said temporary storage area, wherein said second writing means generates a start signal at the beginning of a storage operation to said first memory area and said updating means is responsive to said start signal for updating said backup status register and is responsive to said completion signal for updating said backup status register;
means responsive to said backup status information for restarting a data processing task utilizing said initial data stored in said first memory area if a system failure has occurred before the beginning of a storage operation from said temporary storage area to said first memory area; and
means responsive to said backup status information for writing the contents of said second memory area into said first memory area and restarting the data processing task using data stored in said second memory area if a system failure has occurred after the beginning of a storage operation from said temporary memory to said first memory area but before said storage operation has been completed.
5 Assignments
0 Petitions
Accused Products
Abstract
Apparatus for maintaining duplicate copies of information stored in fault-tolerant computer main memories is disclosed. A non write-through cache memory associated with each of the system'"'"'s processing elements stores computations generated by that processing element. At a context switch, the stored information is sequentially written to two separate main memory units. A separate status area in main memory is updated by the processing element both before and after each writing operation so that a fault occurring during data processing or during any storage operation leaves the system with sufficient information to be able to reconstruct the data without loss of integrity.
To efficiently transfer information between the cache memory and the system main memories without consuming a large amount of processing time at context switches, a block status memory associated with the cache memory contains an entry for each data block in the cache memory. The entry indicates whether the corresponding data block has been modified during data processing or written with computational data from the processing element. The storage operations are carried out by high-speed hardware which stores only the modified data blocks. Additional special-purpose hardware simultaneously invalidates all cache memory entries so that a new task can be loaded and started.
-
Citations
5 Claims
-
1. Memory backup apparatus for a fault-tolerant computer system having system memories for storing data, a processing element for performing a plurality of data processing tasks and computations, said processing element generating write control signals to cause information to be written into said system memories and task control signals for controlling said system during a context switch, and means for monitoring fault occurrences in said computer system and for generating fault signals, said backup apparatus comprising,
a first memory area and a second, physically separate memory area located in said system memories, said memory areas storing duplicate copies of data and subsequent computational results generated by said processing element; -
a temporary storage area generating a full signal when a predetermined number of storage locations in said memory having been written to; first means responsive to said write signals for writing data from said first memory area into said temporary storage area and for writing computational results produced by said processing element into said temporary storage area; second means responsive to said full signal and to said task control signals for writing selected portions of said temporary storage area into said first memory area when said temporary storage area is full or when said task control signals indicate a context switch is requested, said second means producing a completion signal when said writing has been completed; third means responsive to said completion signal and to said fault signals for writing said selected portions of said temporary storage area into said second memory area when the writing of data from said temporary storage area into said first memory area has been completed without a fault condition being detected; a backup status register and means connected to said second and said third writing means for updating backup status information stored in said backup status register, said backup status information identifying memory areas which have been written from said temporary storage area, wherein said second writing means generates a start signal at the beginning of a storage operation to said first memory area and said updating means is responsive to said start signal for updating said backup status register and is responsive to said completion signal for updating said backup status register; means responsive to said backup status information for restarting a data processing task utilizing said initial data stored in said first memory area if a system failure has occurred before the beginning of a storage operation from said temporary storage area to said first memory area; and means responsive to said backup status information for writing the contents of said second memory area into said first memory area and restarting the data processing task using data stored in said second memory area if a system failure has occurred after the beginning of a storage operation from said temporary memory to said first memory area but before said storage operation has been completed. - View Dependent Claims (2, 3, 4)
-
-
5. A method for backing up data in a fault-tolerant computer system having a first, a second and a third system memories for storing data, a processing element for performing a plurality of data processing tasks and computations, a status register and means for detecting the occurrence of a fault in said computer system, said first and third system memories storing duplicate copies of data and subsequent computational results generated by said processing element, said method comprising the steps of:
-
A. during the processing of a task, writing data associated with said task from said first memory into said second memory and writing computational results produced by said processing element into said second memory; B. detecting a context switch condition or a memory full condition in which a predetermined number of memory locations have been written to; C1. after a context switch has been initiated or after a memory full condition has been detected copying selected portions of said second memory into said first memory; C2. checking said fault detector to determine if a fault has been detected; C3. storing a first status code in said status register indicating that the copying of selected portions of said second memory to said first memory has begun if no fault has been detected in step C2; C4. detecting the completion of the copying of said selected portions of said second memory to said first memory; C5. checking said fault detector to determine if a fault has been detected; C6. storing a status code in said status register if no fault has been detected in step C5, said second status code indicating that the copying selected portions of said second memory to said first memory has been completed; D1. copying said selected portions of said second memory into said third memory when the copying of data from said second memory into said first memory has been completed without a fault condition being detected; D2. checking said status register to obtain the status code stored therein; D3. checking said fault detector to determine if a fault has been detected; D4. storing a third status code in said register when said status code stored in said register is said second status code and no fault has been detected in step D3, said third code indicating that copying of said selected portions of said second memory to said third memory has begun; E. restarting said system using data stored in said first memory if a fault is detected in step C2; F. restarting said system using data stored in said third memory if a fault is detected in step C5; and G. restarting said system using data stored in said first memory if a fault is detected in step D3.
-
Specification