Massively scalable object storage system for storing object replicas
First Claim
1. A method for storing data, comprising:
- providing a plurality of physical storage pools, each storage pool including a plurality of storage nodes coupled to a network, each storage node further providing a non-transitory computer readable medium for data storage;
classifying a plurality of availability zones, wherein the storage nodes within an availability zone are subject to a correlated loss of access to stored data;
defining a plurality of abstract partitions, wherein each possible input data management request deterministically corresponds to one of the plurality of abstract partitions, and wherein the deterministic correspondence between each possible input data management request and an abstract partition of the plurality of abstract partitions is established using a function;
mapping the plurality of abstract partitions to the plurality of physical storage pools such that each mapped physical storage pool includes a replica of the data associated with the associated mapped abstract partition, and each replica for a particular abstract partition is mapped to a physical storage pool in a different availability zone, wherein the mapping the plurality of abstract partitions to the plurality of physical storage pools includes applying a constraint satisfaction algorithm;
receiving a data management request over the network, the data management request associated with a data object;
determining, using the data object and function, a partition identification;
identifying, using the partition identification and the constraint satisfaction algorithm, a first partition corresponding to the received data management request; and
manipulating the data object in the physical storage pools mapped to the first partition in accordance with the data management request.
7 Assignments
0 Petitions
Accused Products
Abstract
Several different embodiments of a massively scalable object storage system are described. The object storage system is particularly useful for storage in a cloud computing installation whereby shared servers provide resources, software, and data to computers and other devices on demand. In several embodiments, the object storage system includes a ring implementation used to associate object storage commands with particular physical servers such that certain guarantees of consistency, availability, and performance can be met. In other embodiments, the object storage system includes a synchronization protocol used to order operations across a distributed system. In a third set of embodiments, the object storage system includes a metadata management system. In a fourth set of embodiments, the object storage system uses a structured information synchronization system. Features from each set of embodiments can be used to improve the performance and scalability of a cloud computing object storage system.
-
Citations
23 Claims
-
1. A method for storing data, comprising:
-
providing a plurality of physical storage pools, each storage pool including a plurality of storage nodes coupled to a network, each storage node further providing a non-transitory computer readable medium for data storage; classifying a plurality of availability zones, wherein the storage nodes within an availability zone are subject to a correlated loss of access to stored data; defining a plurality of abstract partitions, wherein each possible input data management request deterministically corresponds to one of the plurality of abstract partitions, and wherein the deterministic correspondence between each possible input data management request and an abstract partition of the plurality of abstract partitions is established using a function; mapping the plurality of abstract partitions to the plurality of physical storage pools such that each mapped physical storage pool includes a replica of the data associated with the associated mapped abstract partition, and each replica for a particular abstract partition is mapped to a physical storage pool in a different availability zone, wherein the mapping the plurality of abstract partitions to the plurality of physical storage pools includes applying a constraint satisfaction algorithm; receiving a data management request over the network, the data management request associated with a data object; determining, using the data object and function, a partition identification; identifying, using the partition identification and the constraint satisfaction algorithm, a first partition corresponding to the received data management request; and manipulating the data object in the physical storage pools mapped to the first partition in accordance with the data management request. - View Dependent Claims (2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15)
-
-
16. A massively scalable online data storage system, comprising:
-
a distributed storage coupled to a network, the distributed storage including a first storage pool and a second storage pool from a plurality of storage pools, the first storage pool in a first availability zone and the second storage pool in a second availability zone, each storage pool including at least one processor, a non-transitory computer readable medium, and a communications interface; a director coupled to the network, the director including a processor, a computer readable medium, and a communications interface; a constrained mapping database to store a mapping of a first abstract partition to the first and second storage pools, the mapping being based on a number of abstract partitions and a number of storage pools in the plurality of storage pools and based on applying a constraint satisfaction algorithm; and a ring structure associated with the director, wherein the ring structure associates, using a hashing function, a storage request with the first abstract partition from a plurality of abstract partitions, and selectively associates a first abstract partition with a first fault-tolerant multi-master replication target, the first replication target including the first storage pool and the second storage pool; wherein a partition identification of the first abstract partition includes a plurality of parts, and the ring structure maps the first abstract partition by applying the constraint satisfaction algorithm to each of the plurality of parts, wherein the director routes inbound storage requests to the replication target and outbound storage responses from the replication target. - View Dependent Claims (17, 18, 19, 20, 21, 22)
-
-
23. A non-transitory computer readable medium containing executable instructions, which when executed on a processor:
-
at a first time, initialize a ring by retrieving a set of ring parameters, the ring parameters including a number of abstract partitions, a number of physical storage pools, and a constraint satisfaction algorithm;
performing a consistent hashing function associating a first range of inputs with a first abstract partition and a second range of inputs with a second abstract partition;
determining a plurality of outputs by applying an algorithm to a result of the performing a consistent hashing function; and
allocating the available physical storage pools by mapping each abstract partition to one or more storage pools based on applying the constraint satisfaction algorithm to each output of the plurality of outputs;at a second time, opaquely route an input request to a correct storage pool in accordance with the initialized ring; at a third time, rebalance the ring by retrieving the set of ring parameters, performing a consistent hashing function associating the range of inputs with the first abstract partition and the second range of inputs with the second abstract partition; and
allocating the available storage pools mapping each abstract partition to one or more storage pools in accordance with the set of performance constraints such that each abstract partition has zero or one changes in the physical storage pools allocated thereto.
-
Specification