Massively scalable object storage for storing object replicas
First Claim
1. A method for storing data, comprising:
- providing a plurality of physical storage pools, each physical storage pool including a plurality of storage nodes coupled to a network, and each storage node further providing a non-transitory computer readable medium for data storage;
mapping a first partition of a plurality of partitions to a first set of the physical storage pools, wherein the mapping includes creating a plurality of replicas of a data object for the first partition, wherein each physical storage pool of the first set thereof is located in a different availability zone, and wherein the storage nodes within a particular availability zone are subject to a correlated loss of access to data objects stored therein;
receiving a particular data management request over the network, the received data management request being associated with the data object;
modifying an attribute of the data object;
applying a hash function to the modified attribute;
identifying, based on a result of the hash function, the first partition of the plurality of partitions as corresponding to the received data management request;
manipulating the particular data object in the physical storage pools mapped to the first partition in accordance with the data management request;
modifying the result of the hash function to determine a plurality of different results;
for each data object replica of the plurality of data object replicas, storing the respective data object replica in a storage pool corresponding to a result of the plurality of different results;
determining that a first data object replica stored in a first storage pool is stored in the same availability zone as a second data object replica stored in a second storage pool; and
rebalancing the first and second data object replicas, wherein the rebalancing includes applying a constrained mapping function to bits of the results of the plurality of results, wherein a number of bits to which the constrained mapping function is applied is a partition power, and wherein after the rebalancing the first and second data object replicas are stored in different availability zones.
4 Assignments
0 Petitions
Accused Products
Abstract
An example method for storing data includes providing a plurality of physical storage pools, each storage pool including a plurality of storage nodes coupled to a network. The method also includes mapping a partition of a plurality of partitions to a set of physical storage pools, where each physical storage pool of the set of physical storage pools is located in a different availability zone, and the storage nodes within an availability zone are subject to a correlated loss of access to stored data. The method further includes receiving a data management request over the network, the data management request being associated with a data object. The method also includes identifying a first partition of the plurality of partitions corresponding to the received data management request and manipulating the data object in the physical storage pools mapped to the first partition in accordance with the data management request.
-
Citations
12 Claims
-
1. A method for storing data, comprising:
-
providing a plurality of physical storage pools, each physical storage pool including a plurality of storage nodes coupled to a network, and each storage node further providing a non-transitory computer readable medium for data storage; mapping a first partition of a plurality of partitions to a first set of the physical storage pools, wherein the mapping includes creating a plurality of replicas of a data object for the first partition, wherein each physical storage pool of the first set thereof is located in a different availability zone, and wherein the storage nodes within a particular availability zone are subject to a correlated loss of access to data objects stored therein; receiving a particular data management request over the network, the received data management request being associated with the data object; modifying an attribute of the data object; applying a hash function to the modified attribute; identifying, based on a result of the hash function, the first partition of the plurality of partitions as corresponding to the received data management request; manipulating the particular data object in the physical storage pools mapped to the first partition in accordance with the data management request; modifying the result of the hash function to determine a plurality of different results; for each data object replica of the plurality of data object replicas, storing the respective data object replica in a storage pool corresponding to a result of the plurality of different results; determining that a first data object replica stored in a first storage pool is stored in the same availability zone as a second data object replica stored in a second storage pool; and rebalancing the first and second data object replicas, wherein the rebalancing includes applying a constrained mapping function to bits of the results of the plurality of results, wherein a number of bits to which the constrained mapping function is applied is a partition power, and wherein after the rebalancing the first and second data object replicas are stored in different availability zones. - View Dependent Claims (2, 3, 4, 5, 6, 7, 8)
-
-
9. A scalable online data storage system, comprising:
-
a distributed storage coupled to a network, the distributed storage including a first physical storage pool and a second physical storage pool from a plurality of physical storage pools, the first physical storage pool in a first availability zone and the second physical storage pool in a second availability zone, each storage pool including at least one processor, a non-transitory computer readable medium, and a communications interface; a director coupled to the network, the director including a processor, a computer readable medium, and a communications interface; a constrained mapping database to store a mapping of a first partition of a plurality of partitions to a first set of the physical storage pools, wherein each physical storage pool of the first set thereof is located in a different availability zone, and wherein the storage nodes within a particular availability zone are subject to a correlated loss of access to data objects stored therein; and a ring structure associated with the director, wherein the ring structure receives a particular data management request over the network, wherein the data management request is associated with a particular data object, and wherein the ring structure modifies an attribute of the data object, applies a hash function to the modified attribute, and identifies, based on a result of the hash function, the first partition of the plurality of partitions as corresponding to the received data management request; wherein the ring structure based on the mapping creates a plurality of replicas of the data object for the first partition, and wherein the ring structure modifies the result to determine a plurality of different results; wherein the director manipulates the particular data object in the physical storage pools mapped to the first partition in accordance with the data management request, wherein for each data object replica of the plurality of data object replicas, the director stores the respective data object replica in a storage pool corresponding to a result of the plurality of different results, wherein the ring structure determines that a first data object replica stored in a first storage pool is stored in the same availability zone as a second data object replica stored in a second storage pool, and applies a constrained mapping function to bits of the results of the plurality of results, and wherein the constrained mapping function returns a storage pool location for each portion of the partition identification to which the constrained mapping function is applied. - View Dependent Claims (10, 11, 12)
-
Specification