Massively scalable object storage system
First Claim
1. A system for coordinating events in a distributed system, comprising:
- a plurality of subsidiary nodes coupled to a network, each subsidiary node including at least one processor, a computer-readable medium, and a communications interface, wherein information in a first subsidiary node needs to be synchronized with information in a second subsidiary node in response to a time-varying series of requests;
a first subsidiary node including a first local clock, the first local clock being set according to a first timeserver;
a second subsidiary node including a second local clock, the second local clock being set according to a second timeserver;
a first synchronization rectifier minimizing differences in time between the first timeserver and the second timeserver;
wherein the first local clock and the second local clock are synchronized to within an error window ε
, where ε
is greater than the maximum clock skew between the first local clock and the second local clock as determined by the first synchronization rectifier; and
wherein the synchronization rectifier implements an arbiter to resolve observed time conflicts in the distributed system.
3 Assignments
0 Petitions
Accused Products
Abstract
Several different embodiments of a massively scalable object storage system are described. The object storage system is particularly useful for storage in a cloud computing installation whereby shared servers provide resources, software, and data to computers and other devices on demand. In several embodiments, the object storage system includes a ring implementation used to associate object storage commands with particular physical servers such that certain guarantees of consistency, availability, and performance can be met. In other embodiments, the object storage system includes a synchronization protocol used to order operations across a distributed system. In a third set of embodiments, the object storage system includes a metadata management system. In a fourth set of embodiments, the object storage system uses a structured information synchronization system. Features from each set of embodiments can be used to improve the performance and scalability of a cloud computing object storage system.
-
Citations
20 Claims
-
1. A system for coordinating events in a distributed system, comprising:
-
a plurality of subsidiary nodes coupled to a network, each subsidiary node including at least one processor, a computer-readable medium, and a communications interface, wherein information in a first subsidiary node needs to be synchronized with information in a second subsidiary node in response to a time-varying series of requests; a first subsidiary node including a first local clock, the first local clock being set according to a first timeserver; a second subsidiary node including a second local clock, the second local clock being set according to a second timeserver; a first synchronization rectifier minimizing differences in time between the first timeserver and the second timeserver; wherein the first local clock and the second local clock are synchronized to within an error window ε
, where ε
is greater than the maximum clock skew between the first local clock and the second local clock as determined by the first synchronization rectifier; andwherein the synchronization rectifier implements an arbiter to resolve observed time conflicts in the distributed system. - View Dependent Claims (2, 3, 4, 5, 6)
-
-
7. A system for coordinating events in a distributed system, comprising:
-
a geographically distributed storage across which data can be replicated, the geographically distributed storage comprising at least a first zone and a second zone, the two zones communicatively coupled with each other, wherein each zone is defined by a probable correlated loss of access or data; each zone including a storage management server, a storage pool, a timeserver, and a synchronization rectifier, the storage pool comprising a plurality of storage nodes, including at least a first storage node and a second storage node, each storage node including at least one processor, a computer-readable medium, a communications interface, and a local clock; wherein each storage zone associates a timestamp with the data received in a time-varying series of requests to interact with the storage pool; wherein each timestamp associated with a received datum is provided according to a rectified time, the rectified time being adjusted by the synchronization rectifier in response to values provided by at least the local timeserver; and wherein the synchronization rectifier implements an arbiter to resolve observed time conflicts between a local timeserver and a geographically remote timeserver. - View Dependent Claims (8, 9, 10, 11, 12, 13)
-
-
14. A method for coordinating events in a distributed system, comprising:
-
providing a geographically distributed storage across which data can be replicated, the geographically distributed storage comprising at least a first zone and a second zone, the two zones communicatively coupled with each other, wherein each zone is defined by a probable correlated loss of access or data; providing within each zone a timeserver and a plurality of subsidiary nodes coupled to a communications network, each subsidiary node including at least one processor, a computer-readable medium, and a communications interface, a first subsidiary node including a first local clock and a second subsidiary node including a second local clock; minimizing the differences in time between a first timeserver in the first zone and a second timeserver in the second zone via a first synchronization rectifier, such that the first local clock and the second local clock in the first zone are synchronized to within an error window ε
, where ε
is greater than a maximum clock skew as determined by the first synchronization rectifier; andwherein the synchronization rectifier implements an arbiter to resolve observed time conflicts between the first timeserver and the second timeserver in the distributed system. - View Dependent Claims (15, 16, 17, 18, 19, 20)
-
Specification