Scalable transport with cluster-consensus rendezvous
First Claim
1. A method of putting a chunk of payload data in a cluster of storage servers using unreliable datagrams, the method comprising:
- performing a cryptographic hash of the chunk to generate a content hash identifier for the chunk;
selecting a negotiating group for the chunk by mapping the content hash identifier to a distributed hash allocation table;
multicasting a put proposal from an initiating client to the storage servers in the cluster that are in the negotiating group for the chunk;
in response to the put proposal, multicasting a put accept response from each of the storage servers in the negotiating group to all other storage servers in the negotiating group;
evaluating the put accept responses by each of the storage servers in the negotiating group to determine members of a rendezvous group and select which storage server in the negotiating group is to send a consensus put accept to the initiating client;
receiving the consensus put accept by the initiating client; and
multicasting the payload data of the chunk from the initiating client to the rendezvous group to perform the rendezvous transfer.
4 Assignments
0 Petitions
Accused Products
Abstract
Embodiments disclosed herein provide advantageous methods and systems that use multicast communications via unreliable datagrams sent on a protected traffic class. These methods and systems provide effectively reliable multicast delivery while avoiding the overhead associated with point-to-point protocols. Rather than an exponential scaling of point-to-point connections (with expensive setup and teardown of the connections), the traffic from one server is bounded by linear scaling of multicast groups. In addition, the multicast rendezvous disclosed herein creates an edge-managed flow control that accounts for the dynamic state of the storage servers in the cluster, without needing centralized control, management or maintenance of state. This traffic shaping avoids the loss of data due to congestion during sustained oversubscription. Other embodiments, aspects and features are also disclosed.
30 Citations
22 Claims
-
1. A method of putting a chunk of payload data in a cluster of storage servers using unreliable datagrams, the method comprising:
-
performing a cryptographic hash of the chunk to generate a content hash identifier for the chunk; selecting a negotiating group for the chunk by mapping the content hash identifier to a distributed hash allocation table; multicasting a put proposal from an initiating client to the storage servers in the cluster that are in the negotiating group for the chunk; in response to the put proposal, multicasting a put accept response from each of the storage servers in the negotiating group to all other storage servers in the negotiating group; evaluating the put accept responses by each of the storage servers in the negotiating group to determine members of a rendezvous group and select which storage server in the negotiating group is to send a consensus put accept to the initiating client; receiving the consensus put accept by the initiating client; and multicasting the payload data of the chunk from the initiating client to the rendezvous group to perform the rendezvous transfer. - View Dependent Claims (2, 3, 4, 19)
-
-
5. A method of getting a chunk of payload data from a cluster of storage servers using unreliable datagrams, the method comprising:
-
performing a cryptographic hash of the chunk to generate a content hash identifier for the chunk; selecting a negotiating group for the chunk by mapping the content hash identifier to a distributed hash allocation table; multicasting a get request from an initiating client to the storage servers in the cluster that are in the negotiating group for the chunk; in response to the get request, multicasting a get response from each of the storage servers in the negotiating group to all other storage servers in the negotiating group; and evaluating the get responses by each of the storage servers in the negotiating group to determine a designated storage server that is to perform a rendezvous transfer to a rendezvous group; multicasting a get accept from the initiating client to the negotiating group, wherein the get accept indicates the designated storage server; and performing the rendezvous transfer by the designated storage server. - View Dependent Claims (6, 7, 8, 20, 21, 22)
-
-
9. A system that stores a chunk of payload data using unreliable datagrams, the system comprising:
-
a cluster of storage servers; a network of non-blocking switches communicatively interconnecting the storage servers of the cluster; and an initiating client comprising a client system that communicatively interconnects to the network, wherein the initiating client multicasts a put proposal to the storage servers of the cluster that are in a negotiating group for the chunk, the negotiating group being selected by mapping a content hash identifier to a distributed hash allocation table, wherein each of the storage servers in the negotiating group multicast put accept responses to the all the other storage servers in the negotiating group, wherein each of the storage servers in the negotiating group evaluates the put accept responses to determine members of a rendezvous group and select which storage server in the negotiating group is to send a consensus put accept to the initiating client, wherein the consensus put accept is received by the initiating client, and wherein the rendezvous transfer is performed by the initiating client multicasting the payload data of the chunk to the rendezvous group. - View Dependent Claims (10, 11, 12, 18)
-
-
13. A system that retrieves a chunk of payload data using unreliable datagrams, the system comprising:
-
a cluster of storage servers; a network of non-blocking switches communicatively interconnecting the storage servers of the cluster; and an initiating client comprising a client system that communicatively interconnects to the network, wherein the initiating client multicasts a get request to the storage servers of the cluster that are in a negotiating group for the chunk, the negotiating group being selected by mapping a content hash identifier to a distributed hash allocation table, wherein each of the storage servers in the negotiating group multicast get responses to the all the other storage servers in the negotiating group, wherein each of the storage servers in the negotiating group evaluates the get responses to select a designated storage server in the negotiating group to perform a rendezvous transfer to a rendezvous group, and wherein the rendezvous transfer is performed by the designated storage server. - View Dependent Claims (14, 15, 16)
-
-
17. A method of storing a chunk of payload data by a storage server within a cluster of storage servers using unreliable datagrams, the method comprising:
-
receiving a multicast put proposal from an initiating client by the storage server of a negotiating group, wherein the negotiating group is determined by mapping a content hash identifier to a distributed hash allocation table; multicasting a proposal response from the storage server to other storage servers in the negotiating group; and receiving proposal responses by the storage server from the other storage servers in the negotiating group; evaluating the proposal responses by the storage server to determine a members of a rendezvous group and select which storage server in the negotiating group is to send a consensus put accept to the initiating client; sending the consensus put accept to the initiating client if the storage server is selected; and receiving a multicast of the payload data of the chunk from the initiating client if the storage server is a member of the rendezvous group.
-
Specification