Error recovery in a storage cluster
First Claim
1. A plurality of storage nodes within a single chassis, comprising:
- the plurality of storage nodes configurable to communicate together as a storage cluster, each of the plurality of storage nodes having a non-volatile solid-state storage for user data storage, the non-volatile solid state storage including flash memory, the plurality of storage nodes configured to distribute the user data and metadata associated with the user data throughout the plurality of storage nodes, with erasure coding of the user data;
the plurality of storage nodes configurable to recover from failure of two of the plurality of storage nodes by applying the erasure coding to reading the user data from a remainder of the plurality of storage nodes; and
the plurality of storage nodes configurable to detect an error and engage in an error recovery via one of a processor of one of the plurality of storage nodes, a processor of the non-volatile solid state storage, or the flash memory, wherein the plurality of storage nodes locates and accesses a mirrored remote procedure call cache in one of the plurality of storage nodes, responsive to a failure of a differing one of the plurality of storage nodes having a remote procedure call cache as mirrored by the mirrored remote procedure call cache.
1 Assignment
0 Petitions
Accused Products
Abstract
A plurality of storage nodes within a single chassis is provided. The plurality of storage nodes is configured to communicate together as a storage cluster. The plurality of storage nodes has a non-volatile solid-state storage for user data storage. The plurality of storage nodes is configured to distribute the user data and metadata associated with the user data throughout the plurality of storage nodes, with erasure coding of the user data. The plurality of storage nodes is configured to recover from failure of two of the plurality of storage nodes by applying the erasure coding to the user data from a remainder of the plurality of storage nodes. The plurality of storage nodes is configured to detect an error and engage in an error recovery via one of a processor of one of the plurality of storage nodes, a processor of the non-volatile solid state storage, or the flash memory.
173 Citations
17 Claims
-
1. A plurality of storage nodes within a single chassis, comprising:
-
the plurality of storage nodes configurable to communicate together as a storage cluster, each of the plurality of storage nodes having a non-volatile solid-state storage for user data storage, the non-volatile solid state storage including flash memory, the plurality of storage nodes configured to distribute the user data and metadata associated with the user data throughout the plurality of storage nodes, with erasure coding of the user data; the plurality of storage nodes configurable to recover from failure of two of the plurality of storage nodes by applying the erasure coding to reading the user data from a remainder of the plurality of storage nodes; and the plurality of storage nodes configurable to detect an error and engage in an error recovery via one of a processor of one of the plurality of storage nodes, a processor of the non-volatile solid state storage, or the flash memory, wherein the plurality of storage nodes locates and accesses a mirrored remote procedure call cache in one of the plurality of storage nodes, responsive to a failure of a differing one of the plurality of storage nodes having a remote procedure call cache as mirrored by the mirrored remote procedure call cache. - View Dependent Claims (2, 3, 4, 5, 6, 7)
-
-
8. A storage cluster, comprising:
-
a plurality of storage nodes; each of the plurality of storage nodes having a non-volatile solid-state storage with flash memory for storage of user data, the plurality of storage nodes configurable to distribute the user data, using erasure coding, throughout the plurality of storage nodes, the plurality of storage nodes configurable to distribute metadata associated with the user data, and at least one redundant copy of the metadata, throughout the plurality of storage nodes, the plurality of storage nodes is configurable to recover from loss of a remote procedure cache by reading a mirrored remote procedure call cache; and the plurality of storage nodes configurable to recover from error selected from a group consisting of; reading user data; reading the metadata; loss of one or more storage nodes; unavailability of a portion of the user data; and unavailability of a portion of the meta data. - View Dependent Claims (9, 10, 11, 12)
-
-
13. A method for error recovery in a plurality of storage nodes having non-volatile solid-state storage, comprising:
-
distributing user data and metadata throughout the plurality of storage nodes through erasure coding, wherein the plurality of storage nodes are housed within a single chassis that couples the storage nodes as a cluster; detecting an error within user data retrieved from one of the plurality of storage nodes; determining one of; recovering from the error by accessing the user data, via the erasure coding, from a remainder of the plurality of storage nodes, responsive to a determination that two of the plurality of storage nodes are unreachable; recovering from the error by rebuilding the user data, via the erasure coding, into the remainder of the plurality of storage nodes, responsive to the determination that the two of the plurality of storage nodes are unreachable; recovering from the error by retrying a read of flash memory of the non-volatile solid-state storage of one of the plurality of storage nodes, responsive to a determination that the error is a bit error, wherein retrying the read of the flash memory further comprises; applying a probabilistic calculation as to whether a data bit is more likely to be a logical “
1”
value or a logical “
0”
value;
orrecovering from the error by reassigning ownership of a portion of the user data to a one of the plurality of storage nodes, responsive to a determination that a further one of the plurality of storage nodes, having ownership of the portion of the user data, is unreachable. - View Dependent Claims (14, 15, 16, 17)
-
Specification