Preventing data corruption and single point of failure in fault-tolerant memory fabrics
First Claim
Patent Images
1. A redundancy controller to prevent data corruption and single point of failure in a fault-tolerant memory fabric with a plurality of memory modules, the redundancy controller comprising:
- a normal mode engine to issue a primitive request to a memory module;
a request timeout engine to identify the memory module as failed in response to at least one of i) receiving, from the memory module, a containment mode indication responsive to the primitive request, and ii) expiration of a timeout associated with not receiving a response to the primitive request; and
a degraded mode engine to issue primitive requests to remaining memory modules not identified as failed, according to a degraded mode, wherein reads of data located on a failed memory module by the degraded mode engine use parity-reconstruction to reconstruct data on the failed memory module from surviving memory modules serving a stripe and wherein writes to data located on the failed memory module use parity reconstruction to reconstruct lost pre-write data, followed by using the reconstructed lost pre-write data for a new parity value to be written to a healthy memory model that holds parity of the stripe.
3 Assignments
0 Petitions
Accused Products
Abstract
An example device in accordance with an aspect of the present disclosure includes a redundancy controller and/or memory module to prevent data corruption and single point of failure in a fault-tolerant memory fabric. Devices include engines to issue and/or respond to primitive requests, identify failures and/or fault conditions, and receive and/or issue containment mode indications.
-
Citations
14 Claims
-
1. A redundancy controller to prevent data corruption and single point of failure in a fault-tolerant memory fabric with a plurality of memory modules, the redundancy controller comprising:
-
a normal mode engine to issue a primitive request to a memory module; a request timeout engine to identify the memory module as failed in response to at least one of i) receiving, from the memory module, a containment mode indication responsive to the primitive request, and ii) expiration of a timeout associated with not receiving a response to the primitive request; and a degraded mode engine to issue primitive requests to remaining memory modules not identified as failed, according to a degraded mode, wherein reads of data located on a failed memory module by the degraded mode engine use parity-reconstruction to reconstruct data on the failed memory module from surviving memory modules serving a stripe and wherein writes to data located on the failed memory module use parity reconstruction to reconstruct lost pre-write data, followed by using the reconstructed lost pre-write data for a new parity value to be written to a healthy memory model that holds parity of the stripe. - View Dependent Claims (2, 3, 4, 5, 6, 7, 8)
-
-
9. A memory module to prevent data corruption and single point of failure in a fault-tolerant memory fabric with a plurality of redundancy controllers, the memory module comprising:
-
a normal mode engine to respond to primitive requests from redundancy controllers; a fault condition engine to identify a fault condition with the memory module; and a containment mode engine to issue, subsequent to the fault condition having been identified by the fault condition engine, containment mode indications in response to primitive requests received from redundancy controllers, wherein the containment mode indications are transmitted to the plurality of redundancy controllers so as to coordinate entry of redundancy controllers into degraded mode. - View Dependent Claims (10, 11, 12, 13)
-
-
14. A redundancy controller to prevent data corruption and single point of failure in a fault-tolerant memory fabric with a plurality of memory modules, the redundancy controller comprising:
-
a normal mode engine to issue a primitive request to a memory module; a request timeout engine to identify the memory module as failed in response to at least one of i) receiving, from the memory module, a containment mode indication responsive to the primitive request, and ii) expiration of a timeout associated with not receiving a response to the primitive request; and a degraded mode engine to issue primitive requests to remaining memory modules not identified as failed, according to a degraded mode; a journaling engine to, in response to the given memory module being identified as failed, record the given memory module as failed in at least one journal, wherein prior to mounting a redundant array of independent disks (RAID) grouping of the plurality of memory modules, the redundancy controller is to examine the at least one journal associated with the RAID grouping to identify whether one or more redundancy controllers associated with the RAID grouping had previously entered degraded mode, and if so, mount the RAID grouping directly in degraded mode.
-
Specification