Scalable transport method for multicast replication
First Claim
1. A method of distributing a chunk which encodes data or object metadata within a cluster of storage servers, wherein distributing the chunk within the cluster of storage servers comprises performing a chunk put transaction, the method comprising:
- negotiating a rendezvous group by exchanging unreliable datagrams amongst an initiating client and a negotiating group to determine the rendezvous group, wherein the negotiating group comprises a subset of the storage servers,wherein said negotiating uses a cluster-consensus procedure where each member of the negotiating group evaluates delivery options for the chunk put transaction, wherein the delivery options are evaluated consistently by members of the negotiating group, andwherein said exchanging comprises multicasting the unreliable datagrams from the initiating client to the negotiating group and multicasting put accept responses from each storage server in the negotiating group to all other storage servers in the negotiating group;
encoding the chunk in a sequence of unreliable datagrams; and
multicasting the chunk by transmitting the sequence of unreliable datagrams in a rendezvous transfer to the rendezvous group, which is a multicast group, such that a single transmission of the sequence of the unreliable datagrams results in reception of the chunk by multiple members of the rendezvous group.
4 Assignments
0 Petitions
Accused Products
Abstract
Embodiments disclosed herein provide advantageous methods and systems that use multicast communications via unreliable datagrams sent on a protected traffic class. These methods and systems provide effectively reliable multicast delivery while avoiding the overhead associated with point-to-point protocols. Rather than an exponential scaling of point-to-point connections (with expensive setup and teardown of the connections), the traffic from one server is bounded by linear scaling of multicast groups. In addition, the multicast rendezvous disclosed herein creates an edge-managed flow control that accounts for the dynamic state of the storage servers in the cluster, without needing centralized control, management or maintenance of state. This traffic shaping avoids the loss of data due to congestion during sustained oversubscription. Other embodiments, aspects and features are also disclosed.
34 Citations
9 Claims
-
1. A method of distributing a chunk which encodes data or object metadata within a cluster of storage servers, wherein distributing the chunk within the cluster of storage servers comprises performing a chunk put transaction, the method comprising:
-
negotiating a rendezvous group by exchanging unreliable datagrams amongst an initiating client and a negotiating group to determine the rendezvous group, wherein the negotiating group comprises a subset of the storage servers, wherein said negotiating uses a cluster-consensus procedure where each member of the negotiating group evaluates delivery options for the chunk put transaction, wherein the delivery options are evaluated consistently by members of the negotiating group, and wherein said exchanging comprises multicasting the unreliable datagrams from the initiating client to the negotiating group and multicasting put accept responses from each storage server in the negotiating group to all other storage servers in the negotiating group; encoding the chunk in a sequence of unreliable datagrams; and multicasting the chunk by transmitting the sequence of unreliable datagrams in a rendezvous transfer to the rendezvous group, which is a multicast group, such that a single transmission of the sequence of the unreliable datagrams results in reception of the chunk by multiple members of the rendezvous group. - View Dependent Claims (2, 3, 4, 5, 6)
-
-
7. A method of distributing a chunk which encodes data or object metadata within a cluster of storage servers, wherein distributing the chunk within the cluster of storage servers comprises performing a chunk get transaction, the method comprising:
-
negotiating a rendezvous group by exchanging unreliable datagrams amongst an initiating client and a negotiating group to determine the rendezvous group, wherein the negotiating group comprises a subset of the storage servers, and wherein said exchanging comprises multicasting the unreliable datagrams from the initiating client to the negotiating group, wherein the rendezvous group includes at least an initiating client, which is a chunk sink that initiated the chunk get transaction, wherein said negotiating the rendezvous group uses a cluster-consensus procedure where each member of the negotiating group evaluates delivery options for the chunk get transaction, wherein the delivery options are evaluated consistently by members of the negotiating group, and wherein said exchanging unreliable datagrams comprises multicasting a get request message from the initiating client to the negotiating group, and multicasting get responses from each storage server in the negotiating group to all other storage servers in the negotiating group; encoding the chunk in a sequence of unreliable datagrams; and multicasting the chunk by transmitting the sequence of unreliable datagrams in a rendezvous transfer to the rendezvous group which is a multicast group, such that a single transmission of the sequence of the unreliable datagrams results in reception of the chunk by multiple members of the rendezvous group. - View Dependent Claims (8, 9)
-
Specification