GROUP-BASED DATA REPLICATION IN MULTI-TENANT STORAGE SYSTEMS
First Claim
1. A method for data replication in a distributed storage system having a plurality of storage nodes interconnected by an remote direct memory access (“
- RDMA”
) network, the storage nodes individually having a processor, a memory, and an RDMA enabled network interface card (“
RNIC”
) operatively coupled to one another, the method comprising;
writing, from a first RNIC at a first storage node, a block of data from a first memory at the first storage node to a second memory at a second storage node via a second RNIC interconnected to the first RNIC in the RDMA network;
sending, from the first RNIC to the second storage node, metadata representing a memory location and a data size of the written block of data in the second memory via the second RNIC;
performing, at the second storage node, modification of a memory descriptor in the second memory according to the metadata, the memory descriptor being a part of a data structure representing a pre-posted work request for writing a copy of the block of data to a third storage node; and
upon completion of modifying the memory descriptor, writing, from the second RNIC, a copy of the block of data to a third memory at the third storage node via a third RNIC interconnected to the second RNIC in the RDMA network, thereby achieving replication of the block of data in the distributed storage system without using the processors at the second and third storage nodes.
1 Assignment
0 Petitions
Accused Products
Abstract
Distributed storage systems, devices, and associated methods of data replication are disclosed herein. In one embodiment, a server in a distributed storage system is configured to write, with an RDMA enabled NIC, a block of data from a memory of the server to a memory at another server via an RDMA network. Upon completion of writing the block of data to the another server, the server can also send metadata representing a memory location and a data size of the written block of data in the memory of the another server via the RDMA network. The sent metadata is to be written into a memory location containing data representing a memory descriptor that is a part of a data structure representing a pre-posted work request configured to write a copy of the block of data from the another server to an additional server via the RDMA network.
-
Citations
20 Claims
-
1. A method for data replication in a distributed storage system having a plurality of storage nodes interconnected by an remote direct memory access (“
- RDMA”
) network, the storage nodes individually having a processor, a memory, and an RDMA enabled network interface card (“
RNIC”
) operatively coupled to one another, the method comprising;writing, from a first RNIC at a first storage node, a block of data from a first memory at the first storage node to a second memory at a second storage node via a second RNIC interconnected to the first RNIC in the RDMA network; sending, from the first RNIC to the second storage node, metadata representing a memory location and a data size of the written block of data in the second memory via the second RNIC; performing, at the second storage node, modification of a memory descriptor in the second memory according to the metadata, the memory descriptor being a part of a data structure representing a pre-posted work request for writing a copy of the block of data to a third storage node; and upon completion of modifying the memory descriptor, writing, from the second RNIC, a copy of the block of data to a third memory at the third storage node via a third RNIC interconnected to the second RNIC in the RDMA network, thereby achieving replication of the block of data in the distributed storage system without using the processors at the second and third storage nodes. - View Dependent Claims (2, 3, 4, 5, 6, 7, 8, 9)
- RDMA”
-
10. A method for data replication in a distributed storage system having a plurality of storage nodes interconnected by an remote direct memory access (“
- RDMA”
) network, the storage nodes individually having a processor, a memory, and an RDMA enabled network interface card (“
RNIC”
) operatively coupled to one another, the method comprising;sending, from a first RNIC at a first storage node, metadata to a memory at a second storage node via a second RNIC interconnected to the first RNIC in the RDMA network, the metadata representing memory offsets of a source region and a destination region and a data size of data to be moved; receiving, at the second storage node, the sent metadata from the first storage node; performing, at the second storage node, modification of a memory descriptor in the memory according to the metadata, the memory descriptor being a part of a data structure representing a pre-posted work request for writing a block of data from a first memory region to a second memory region in the memory of the second storage node; and upon completion of modifying the memory descriptor, automatically triggering writing, by the second RNIC, a block of data having the data size from the source region of the memory to the destination region of the memory at the second storage node according to the corresponding memory offsets included in the metadata without using the processor for the writing operation at the second storage node. - View Dependent Claims (11, 12, 13, 14, 15)
- RDMA”
-
16. A server in a distributed storage system having a plurality of servers interconnected by an remote direct memory access (“
- RDMA”
) network, the server comprising;a processor; a memory; and an RDMA enabled network interface card (“
RNIC”
) operatively coupled to one another, wherein the memory containing instructions executable by the processor to cause the server to;write, with the RNIC, a block of data from the memory to another memory at another server via another RNIC interconnected to the RNIC in the RDMA network, the block of data representing an update to a data object stored in the distributed storage system; upon completion of writing the block of data to the another server, send, from the RNIC to the another server, metadata representing a memory location and a data size of the written block of data in the another memory of the another server via the RDMA network; and wherein the sent metadata is written into a memory location containing data representing a memory descriptor that is a part of a data structure representing a pre-posted work request configured to write a copy of the block of data to a further server via the RDMA network. - View Dependent Claims (17, 18, 19, 20)
- RDMA”
Specification