×

Apparatus and method for packet transmission over a high speed network supporting remote direct memory access operations

  • US 8,458,280 B2
  • Filed: 12/22/2005
  • Issued: 06/04/2013
  • Est. Priority Date: 04/08/2005
  • Status: Active Grant
First Claim
Patent Images

1. An apparatus, for performing remote direct memory access (RDMA) operations between a first server and a second server over an Ethernet fabric, the RDMA operations being initiated by execution of a verb according to a remote direct memory access protocol, the verb being executed by a central processing unit (CPU) on the first server, the apparatus comprising:

  • a network adapter in the first server, the network adapter including transaction logic, the transaction logic being to process a work queue element corresponding to the verb, and also being to accomplish the RDMA operations over a Transmission Control Protocol/Internet Protocol (TCP/IP) interface between the first and second servers, wherein said work queue element resides within first host memory in the first server, the first host memory being coupled to the CPU via a memory controller, the network adapter being coupled to the first host memory via both a host interface that is comprised in the network adapter and the memory controller, the first host memory to store an adapter driver to provide control of the network adapter, said transaction logic comprising;

    transmit history information stores, to maintain a local copy of a subset of parameters in said work queue element, the transmit history information stores including additional parameters in addition to the parameters in the work queue element, the transmit history stores being in a local memory that is comprised in the network adapter, the local memory being separate and distinct from the first host memory within which resides the work queue element, the transmit history information stores being to store the local copy and the additional parameters in one or more entries in one or more first-in-first-out (FIFO) buffers in the transmit history information stores, the one or more FIFO buffers being dynamically bound to the work queue element residing within the first host memory; and

    a protocol engine, coupled to said transmit history information stores, to access said local copy of the subset of the parameters and the additional parameters, the subset being selected so as to enable the protocol engine to rebuild, based on the local copy, for retransmission one or more TCP segments corresponding to the RDMA operations in event of network transmission error, the subset also being selected so as to enable the protocol engine to determine, based on the local copy, if the RDMA operations have been completed;

    the transaction logic in the network adapter also comprising IP address logic coupled both to a medium access controller (MAC) of the network adapter and to the protocol engine, the IP address logic to contain IP address entries to be used as source IP addresses in transmitted messages, the network adapter to compare with the IP address entries a destination IP address of an inbound datagram received by the MAC, the network adapter to process the inbound datagram in accordance with a RDMA connection processing pipeline only if the destination IP address matches one of the IP address entries, the network adapter to process the inbound datagram using a TCP/IP stack if no match to the destination IP address is in the IP address entries, the transaction logic including connection correlation logic to provide, for an outgoing transmission, mapping of a work queue number to TCP/IP routing parameters, the TCP/IP routing parameters including source and destination TCP ports and source and destination IP addresses, the one or more entries in the one or more FIFO buffers in the transmit history information stores including a plurality of such entries, each respective one of the plurality of such entries including a respective field set and corresponding with a respective corresponding one of entries in the work queue element, each respective field set including a respective sendmsn field, a respective readmsn field, a respective first flag field, a respective startseqnum field, a respective finalseqnum field, a respective sackpres field, a respective notifyoncomp field, and a respective maximum upper level protocol data unit (MULPDU) field, the respective sendmsn field maintaining a current send message sequence number, the respective readmsn field maintaining a current read message sequence number, the respective startseqnum field maintaining an initial TCP sequence number of the respective one of the entries in the work queue elements, the finalseqnum field maintaining a final TCP sequence number of a message corresponding to the respective one of the entries in the work queue elements, the startseqnum field and the finalseqnum field being provided to the respective one of the plurality of entries in the one or more FIFO buffers in the transmit history information stores during creation of a first TCP segment of the message, the respective first flag field indicating whether a TCP streaming mode, other than RDMA over TCP, is being employed to perform a TCP-offload related data transaction associated with the respective corresponding one of the entries in the work queue element, the respective MULPDU field being to record a size of a MULPDU, associated with the respective corresponding one of the entries in the work queue element, that was in effect at a previous transmission time of the MULPDU, the size recorded in the MULPDU field to be used to re-segment one or more framed protocol data units (FPDU) and to rebuild one or more TCP segments that were transmitted during the previous transmission time in event of either of the following limitations numbered (1) and (2);

    (1) a network error associated with the one or more TCP segments, and (2) dynamic changing of the size of the MULPDU, the one or more TCP segments that are rebuilt consisting of a partial FPDU if the size of the MULPDU has been dynamically changed, the respective sackpres field being to indicate whether the respective MULPDU field has been reduced by allocation for a maximum sized SACK block, the respective notifyoncomp field being to indicate whether completion queue element generation is to occur for the adapter after outstanding TCP message segment acknowledgement.

View all claims
  • 5 Assignments
Timeline View
Assignment View
    ×
    ×