Mitigating silent data corruption in a buffered memory module architecture
First Claim
1. An apparatus comprising:
- a first agent to be coupled with a buffered memory module channel, the first agent having an M-bit cyclic redundancy check (CRC) implementation and an N-bit CRC implementation, wherein the N-bit CRC implementation is to be selected if at least one bit-lane of the buffered memory module channel fails;
a second agent coupled with the buffered memory module channel, the second agent to indicate whether data received from the buffered memory module channel contains a correctable error; and
a third agent coupled with the second agent, the third agent capable of signaling a data retry if at least one bit-lane of the buffered memory module channel fails and the second agent indicates a correctable error.
1 Assignment
0 Petitions
Accused Products
Abstract
Embodiments of the invention are generally directed to systems, apparatuses, and methods for mitigating silent data corruption in a fully-buffered memory module architecture. In an embodiment, a memory controller includes a memory channel bit-lane error detector having an M-bit CRC and N-bit CRC, wherein N is less than M. The N-bit CRC is used if at least one bit-lane of the memory channel fails. In one embodiment, the memory controller selectively applies the strong error detection capability of an error correction code (ECC) in combination with the N-bit CRC to signal the need to resend faulty data, if at least one bit-channel has failed. Other embodiments are described and claimed.
113 Citations
25 Claims
-
1. An apparatus comprising:
-
a first agent to be coupled with a buffered memory module channel, the first agent having an M-bit cyclic redundancy check (CRC) implementation and an N-bit CRC implementation, wherein the N-bit CRC implementation is to be selected if at least one bit-lane of the buffered memory module channel fails;
a second agent coupled with the buffered memory module channel, the second agent to indicate whether data received from the buffered memory module channel contains a correctable error; and
a third agent coupled with the second agent, the third agent capable of signaling a data retry if at least one bit-lane of the buffered memory module channel fails and the second agent indicates a correctable error. - View Dependent Claims (2, 3, 4, 5, 6, 7, 8)
-
-
9. A method comprising:
-
receiving, at a first agent, a data packet from a buffered memory module channel, the first agent having an M-bit CRC and an N-bit CRC, wherein the N-bit CRC is to be applied to the data packet if at least one bit-lane of the buffered memory module channel fails;
detecting that at least one bit-lane of the buffered memory module channel has failed;
detecting a correctable error in the data packet based, at least in part, on an error correction code and validating that error through data retransmission and comparison; and
recovering, at least in part, a silent error rate (SER) budget based, at least in part, on a retry operation if the data packet contains a correctable error. - View Dependent Claims (10, 11, 12, 13, 14)
-
-
15. A system comprising:
-
a memory device coupled with a buffered memory module channel;
a memory channel bit-lane error detector coupled with the buffered memory module channel, the memory channel bit-lane error detector having an M-bit CRC and an N-bit CRC, wherein the N-bit CRC is to be selected if at least one bit-lane of the buffered memory module channel fails;
a memory content error detector coupled with the buffered memory module channel, the memory content error detector to indicate whether data received from the buffered memory module channel contains a correctable error; and
a retry engine coupled with the memory content error detector, the retry engine capable of signaling a data retry if at least one bit-lane of the buffered memory module channel fails and the memory content error detector indicates a correctable error. - View Dependent Claims (16, 17, 18, 19, 20, 21)
-
-
22. An article of manufacture comprising:
-
an electronically accessible medium providing instructions that, when executed by an apparatus, cause the apparatus to receive, at a first agent, a data packet from a buffered memory module channel, the first agent having an M-bit CRC and an N-bit CRC, wherein the N-bit CRC is to be applied to the data packet if at least one bit-lane of the buffered memory module channel fails;
detect that at least one bit-lane of the buffered memory module channel has failed;
detect a correctable error in the data packet based, at least in part, on an error correction code and validating that error through data retransmission and comparison; and
recover, at least in part, a silent error rate (SER) budget based, at least in part, on a retry operation if the data packet contains a correctable error. - View Dependent Claims (23, 24, 25)
-
Specification