System and method for providing high availability data
First Claim
1. A system, comprising:
- a distributed data store comprising a plurality of computing devices comprising respective hardware processors and memory and configured to implement a plurality storage hosts that store a plurality of data sets for the distributed data store, wherein individual ones of the storage hosts are assigned to a plurality of different hash value ranges of a set of hash values mapped to different ones of the data sets;
the distributed data store, configured to;
receive an access request over a network for one of the data sets stored in the distributed data store from a client of the distributed data store, wherein the access request identifies a key associated with the data set, wherein the access request is a request to perform a write operation to modify data in the data set;
generate a hash value for the data set based, at least in part, on the identified key according to a hash function for identifying storage hosts that store versions of the data sets;
identify a first storage host of the storage hosts that stores a first version of the data set to service the access request that is assigned to a first hash value range of the hash value ranges that includes the hash value for the data set;
from a second hash value range of the hash value ranges that is determined to be successive to the first hash value range in an order for selecting hosts from the hash value ranges, select a second storage host of the storage hosts that stores a second version of the data set, wherein the second storage host is assigned to the second hash value range;
direct the access request to the data set to be performed at both the first storage host and the second storage host before returning a response for the access request over the network to the client; and
wherein the selection of the second storage host of the storage hosts that stores the second version of the data set is performed as part of an identification of a number of storage hosts including the second storage host such that the write operation is successfully completed in satisfaction of a write quorum requirement before returning the response.
0 Assignments
0 Petitions
Accused Products
Abstract
A computer-implemented data processing system and method writes a first plurality of copies of a data set at a first plurality of hosts and reads a second plurality of copies of the data set at a second plurality of hosts. The first and second pluralities of copies may be overlapping and the first and second pluralities of hosts may be overlapping. A hashing function may be used to select the first and second pluralities of hosts. Version histories for each of the first copies of the data set may also be written at the first plurality of hosts and read at the second plurality of hosts. The version histories for the second copies of the data set may be compared and causal between the second copies of the data set may be evaluated based on the version histories for the second copies of the data set.
45 Citations
17 Claims
-
1. A system, comprising:
-
a distributed data store comprising a plurality of computing devices comprising respective hardware processors and memory and configured to implement a plurality storage hosts that store a plurality of data sets for the distributed data store, wherein individual ones of the storage hosts are assigned to a plurality of different hash value ranges of a set of hash values mapped to different ones of the data sets; the distributed data store, configured to; receive an access request over a network for one of the data sets stored in the distributed data store from a client of the distributed data store, wherein the access request identifies a key associated with the data set, wherein the access request is a request to perform a write operation to modify data in the data set; generate a hash value for the data set based, at least in part, on the identified key according to a hash function for identifying storage hosts that store versions of the data sets; identify a first storage host of the storage hosts that stores a first version of the data set to service the access request that is assigned to a first hash value range of the hash value ranges that includes the hash value for the data set; from a second hash value range of the hash value ranges that is determined to be successive to the first hash value range in an order for selecting hosts from the hash value ranges, select a second storage host of the storage hosts that stores a second version of the data set, wherein the second storage host is assigned to the second hash value range; direct the access request to the data set to be performed at both the first storage host and the second storage host before returning a response for the access request over the network to the client; and wherein the selection of the second storage host of the storage hosts that stores the second version of the data set is performed as part of an identification of a number of storage hosts including the second storage host such that the write operation is successfully completed in satisfaction of a write quorum requirement before returning the response. - View Dependent Claims (2, 3, 4, 5)
-
-
6. A method, comprising:
performing, by a plurality of computing devices; maintaining plurality of data sets at a plurality of storage hosts implemented as part of a distributed data store, wherein individual ones of the storage hosts are assigned to a plurality of different hash value ranges of a set of hash values mapped to different ones of the data sets; receiving an access request for one of the data sets stored in the distributed data store from a client of the distributed data store over a network, wherein the access request identifies a key associated with the data set, wherein the access request is a request to perform a write operation to modify data in the data set; generating a hash value for the data set based, at least in part, on the identified key according to a hash function for identifying storage hosts that store versions of the data sets; identifying a first storage host of the storage hosts that stores a version of the data set to service the access request that is assigned to a first hash value range of the hash value ranges that includes the hash value for the data set; from a second hash value range of the hash value ranges that is determined to be successive to the first hash value range in an order for selecting hosts from the hash value ranges, selecting a second storage host of the storage hosts that stores a second version of the data set, wherein the second storage host is assigned to the second hash value range; directing the access request to the data set to be performed at both the first storage host and the second storage host before returning a response for the access request to the client over the network; and wherein the selection of the second storage host of the storage hosts that stores the second version of the data set is performed as part of an identification of a number of storage hosts including the second storage host such that the write operation is successfully completed in satisfaction of a write quorum requirement before returning the response. - View Dependent Claims (7, 8, 9, 10, 11)
-
12. A non-transitory, computer-readable storage medium, storing program instructions that when executed by a plurality of computing devices cause the plurality of computing devices to implement:
-
maintaining plurality of data sets at a plurality of storage hosts implemented as part of a distributed data store, wherein individual ones of the storage hosts are assigned to a plurality of different hash value ranges of a set of hash values mapped to different ones of the data sets; receiving an access request for one of the data sets stored in the distributed data store from a client of the distributed data store over a network, wherein the access request identifies a key associated with the data set, wherein the access request is a request to perform a write operation to modify data in the data set; generating a hash value for the data set based, at least in part, on the identified key according to a hash function for identifying storage hosts that store versions of the data sets; identifying a first storage host of the storage hosts that stores a first version of the data set to service the access request that is assigned to a first hash value range of the hash value ranges that includes the hash value for the data set; from a second hash value range of the hash value ranges that is determined to be successive to the first hash value range in an order for selecting hosts from the hash value ranges, selecting a second storage host of the storage hosts that stores a second version of the data set, wherein the second storage host is assigned to the second hash value range; directing the access request to the data set to be performed at both the first storage host and the second storage host before returning a response for the access request to the client over the network; and wherein the selection of the second storage host of the storage hosts that stores the second version of the data set is performed as part of an identification of a number of storage hosts including the second storage host such that the write operation is successfully completed in satisfaction of a write quorum requirement before returning the response. - View Dependent Claims (13, 14, 15, 16, 17)
-
Specification