Replay mechanism for soft error recovery
First Claim
1. A processor comprising:
- a protected execution unit to process instructions;
a check unit to detect an error associated with processed instructions; and
a replay queue to issue instructions to the protected execution unit for processing, to track the issued instructions, and to reissue selected issued instructions when the check unit detects an error, the replay queue including first and second pointers to indicate a next instruction to issue and a next instruction to retire, wherein the replay queue adjusts the first and second pointers to reissue instructions to the execution unit beginning with an instruction that generated a result mismatch.
1 Assignment
0 Petitions
Accused Products
Abstract
A processor is provided that implements a replay mechanism to recover from soft errors. The processor includes a protected execution unit, a check unit to detect errors in results generated by the protected execution unit, and a replay unit to track selected instructions issued to the protected execution unit. When the check unit detects an error, it triggers the replay unit to reissue the selected instructions to the protected execution unit. One embodiment of the replay unit provides an instruction buffer that includes pointers to track issue and retirement status of in-flight instructions. When the check unit indicates an error, the replay unit resets a pointer to reissue the instruction for which the error was detected.
-
Citations
16 Claims
-
1. A processor comprising:
-
a protected execution unit to process instructions;
a check unit to detect an error associated with processed instructions; and
a replay queue to issue instructions to the protected execution unit for processing, to track the issued instructions, and to reissue selected issued instructions when the check unit detects an error, the replay queue including first and second pointers to indicate a next instruction to issue and a next instruction to retire, wherein the replay queue adjusts the first and second pointers to reissue instructions to the execution unit beginning with an instruction that generated a result mismatch. - View Dependent Claims (2, 3, 4, 5)
the protected execution unit comprises first and second execution units to process instructions in lock step and the replay queue comprises first and second replay queues to provide instructions to the first and second execution units, respectively. -
3. The processor of claim 1, wherein
instructions are flushed from the execution unit when the check unit indicates an error. -
4. The processor of claim 1, wherein
the execution units operate in lock step when the processor is in a high reliability mode and the execution units operate independently when the processor is in a high performance mode. -
5. The processor of claim 1, wherein
the processor implements a recovery algorithm if an instruction that triggers a replay generates a mismatch when it is replayed.
-
-
6. A method for executing instructions with high reliability, comprising:
-
storing an instruction temporarily in a replay buffer;
issuing the instruction to a protected execution unit including staging the instruction to the protected execution unit, and adjusting a first flag in the buffer to indicate the instruction has been issued including setting a first pointer to indicate a buffer slot in which the issued instruction is stored;
setting a second pointer to indicate a buffer slot in which a next instruction to retire is stored;
checking results generated by the instruction in the protected execution unit; and
reissuing the instruction to the protected execution unit if an error is indicated, wherein reissuing the instruction includes copying the second flag to the first flag. - View Dependent Claims (7, 8)
retiring the instruction when no error is indicated.
-
-
8. The method of claim 7, wherein
retiring the instruction includes adjusting a second pointer to indicate the instruction has retired; - and
updating an architectural state data with the result generated by the instruction.
- and
-
9. A computer system comprising:
-
a processor including a protected execution unit to execute instructions in a manner that facilitates soft error detection, a check unit to monitor the protected execution unit and to generate a signal when an error is indicated, a replay unit to provide instructions to the protected execution unit, to track the instructions until they are retired, and to replay selected instructions when the check unit indicates an error, the replay unit includes first and second pointers to indicate a next instruction to issue and a next instruction to retire, respectively, and a storage structure to provide a recovery algorithm to the processor when replay of selected instructions does not eliminate the mismatch;
wherein the protected execution unit is flushed prior to the replay when an error is indicated; and
wherein the replay unit and the protected execution unit are flushed prior to implementing a recovery routine. - View Dependent Claims (10, 11, 12, 13, 14)
the storage structure is a non-volatile memory structure. -
11. The computer system of claim 9, wherein
the protected execution unit comprises first and second execution units and the replay unit provides identical instructions to the first and second execution units. -
12. The computer system of claim 9, wherein
the protected execution unit to execute instructions to facilitate detection of soft errors. -
13. The computer system of claim 12, wherein
the protected execution unit includes redundant execution units that execute instructions to facilitate detection of soft errors. -
14. The computer system of claim 12, wherein
the protected execution unit includes parity-protected storage structures to execute instructions to facilitate detection of soft errors.
-
-
15. A processor comprising:
-
first and second execution cores to process identical instructions in lock step, each execution core including a replay unit to track instructions that have yet to retire, each replay unit including buffer slots to store instructions for execution and a first and second pointers to indicate a next instruction to issue and a next instruction to retire, respectively;
a check unit to compare instructions results generated by the execution cores and to trigger the replay unit to resteer the first and second execution cores to an instruction when the instruction results generate a mismatch; and
wherein each replay unit copies the second pointer to the first pointer when the instruction results generate a mismatch. - View Dependent Claims (16)
the check unit signals an instruction flush when a mismatch is detected.
-
Specification