Byzantine fault tolerance
First Claim
1. A method for fault tolerant operation of a distributed server system that includes N asynchronous servers that may experience faults, comprising:
- receiving a series of requests from a client over a time interval associated with the requests;
at each of the N servers, processing some or all of the client requests including, for each of the requests processed at a server, updating a state of a state machine at that server according to the request and transmitting a response to the client; and
resetting each of the N servers repeatedly during the time interval, wherein resetting a server includes establishing the state of the state machine at that server using data stored at other of the servers so that the state at that server corresponds to a common state of the server system;
wherein when (a) for a predetermined duration time window, fewer than N/3 of the server systems experience faults in any time window of the time interval of the requests of that predetermined duration, and (b) N/3 or more of the N servers experience faults at some time during the time interval of the requests, the N servers provide responses to the client that are sufficient for the client to determine correct responses to each of the series of requests.
3 Assignments
0 Petitions
Accused Products
Abstract
A new approach for asynchronous state-machine replication in a fault-tolerant system offers both integrity and high availability in the presence of Byzantine faults. The approach also improves the security of previous systems by recovering replicas proactively without necessarily identifying that they have failed or been attacked. This proactive recovery limits the time extent of a particular fault by regularly recovering replicas. In this way, the system works correctly even when all the replicas fail multiple times over the lifetime of the system, provided that less than ⅓ of the replicas are all faulty within a window of vulnerability. The approach also features an efficient implementation of message authentication that prevents an attacker from impersonating a replicated node that was faulty after that node recovers
235 Citations
12 Claims
-
1. A method for fault tolerant operation of a distributed server system that includes N asynchronous servers that may experience faults, comprising:
-
receiving a series of requests from a client over a time interval associated with the requests;
at each of the N servers, processing some or all of the client requests including, for each of the requests processed at a server, updating a state of a state machine at that server according to the request and transmitting a response to the client; and
resetting each of the N servers repeatedly during the time interval, wherein resetting a server includes establishing the state of the state machine at that server using data stored at other of the servers so that the state at that server corresponds to a common state of the server system;
wherein when (a) for a predetermined duration time window, fewer than N/3 of the server systems experience faults in any time window of the time interval of the requests of that predetermined duration, and (b) N/3 or more of the N servers experience faults at some time during the time interval of the requests, the N servers provide responses to the client that are sufficient for the client to determine correct responses to each of the series of requests. - View Dependent Claims (2, 3, 4, 5, 6, 7, 8, 9)
during the time interval of the requests, identifying a series of master servers from the N servers such that different servers are identified as master servers at different times;
for each of the requests from the client, (a) receiving the request at a master server, (b) establishing a common sequence number for the request among greater than ⅔
of the N servers, and(c) processing the request at servers at which the common sequence number has been established;
whereby when ⅓
or fewer of the N servers are faulty, greater than ⅓
of the N servers are not faulty and transmit a response to the client.
-
-
5. The method of claim 1 wherein establishing the state of the state machine at a server that has been reset using data stored at other of the servers includes:
-
partitioning the state into separate parts;
retaining the values of the state for the separate parts from prior to resetting the server;
for each separate part, computing a digest characterizing the retained value of the state in that part, and receiving a sufficient number of digests of that part of the state at other of the N servers to determine whether the digest matches the common value of that part of the state; and
if for any part of the state the digest computed at the server does not match the digest of the common value of that part of the state, transferring the values of at least some of that part of the state from another of the N servers.
-
-
6. The method of claim 5 wherein establishing the state of the state machine at a server that has been reset using data stored at other of the servers further includes:
-
partitioning the state into a hierarchy of parts, such that parts of the state are partitioned into sub-parts; and
if for any part of the state the digest computed at the server does not match the digest of the common value of that part of the state, computing a digest characterizing each of the sub-parts of that part, and receiving a sufficient number of digests of those sub-parts of the state at other of the N servers to determine whether the digests match the common values of those sub-parts of the state.
-
-
7. The method of claim 1 wherein processing at least some of the requests include processing a complex operation involving multiple updates to the state machine according to each of those requests.
-
8. The method of claim 1 further comprising:
-
at each of the N servers, computing symmetric keys for communicating with each of the other of the N servers, and distributing the symmetric keys to the other servers; and
repeating the steps of computing and distributing the keys during the time interval.
-
-
9. The method of claim 8 wherein distributing the symmetric keys to the other servers includes encrypting the keys in a message using public key cryptography.
-
10. In a distributed computer system that includes one or more clients and 3F+1 server nodes which in normal operation operate asynchronously and implement a common state machine and during faulty operation F or fewer of the server nodes are concurrently faulty, a method for fault-tolerant operation comprising:
-
receiving a request from a client at a designated master node of the 3F+1 server nodes;
establishing a common sequence number for the request among at least 2F+1 of the 3F+1 server nodes using a three-phase message exchange, during the first phase sending a first message from the designated master node to other of the server nodes identifying the received request, during a second phase sending a second message from each non-faulty server node that received the first message to all other of the server nodes, and during a third phase, sending a third message from each of the non-faulty server nodes that received the second message to all other of the server nodes; and
at each of F+1 or greater of the 3F+1 server nodes that are not faulty and that received the third message, processing the request and transmitting a result to the client. - View Dependent Claims (11, 12)
-
Specification