System and method for message transmission between network nodes connected by parallel links
First Claim
1. A method for sending messages from a first computer to a second computer, comprising the steps of:
- at the first computer, activating one of a plurality of communication links;
transmitting, over the activated communication link, messages from the first computer to the second computer using remote write operations to directly store each message in memory in the second computer, without performing remote read operations to confirm storage of each message in memory of the second computer;
said remote write operations by the first computer writing the messages to global addresses that have previously been mapped to physical addresses in the second computer'"'"'s memory;
assigning each message transmitted by the first computer a sequence number, and including the sequence number in the message when it is transmitted to the second computer;
at the second computer, processing each received message, storing sequence number information indicating the sequence number of each message received and processed, and using remote write operations to directly store an acknowledgment message in memory in the first computer, without performing remote read operations to confirm storage of each acknowledgment message;
said remote write operations by the second computer writing the acknowledgment messages to global addresses that have previously been mapped to physical addresses in the first computer'"'"'s memory; and
at the first computer, upon detecting a failure to receive the acknowledgment message corresponding to any of the previously sent messages, performing remedial actions to determine whether the second computer has processed the unacknowledged previously sent messages;
the remedial actions including, at the initiative of the first computer, retrieving from the second computer at least some of the sequence number information stored in the second computer and determining, using the sequence number information retrieved from the second computer, which messages, if any, sent by the first computer were not received and processed by the second computer, and re-transmitting to the second computer the messages, if any, determined not to have been received and processed by the second computer.
2 Assignments
0 Petitions
Accused Products
Abstract
A first computer sends a sequence of messages to a second computer using remote write operations to directly store each message in a corresponding memory location in the second computer. The second computer retains information denoting the sequence numbers of the messages it receives and processes, and it acknowledges each received message with an asynchronous acknowledgment message. The first computer keeps track of which messages it has sent but for which it has not yet received an acknowledgment. Whenever the first computer determines that it has failed to receive a message acknowledgment from the second computer in a timely fashion, or it needs to reuse previously used message sequence numbers, the first computer undertakes remedial actions to resynchronize the first and second computers. The process begins by prompting the second computer to flush and process all the messages in its receive FIFO, and then comparing sequence number information recorded by the second computer with the sequence numbers of the outstanding, unacknowledged messages sent by the first computer. If the comparison indicates that any messages sent by the first computer were not received and processed by the second computer, those messages are re-transmitted. If necessary, during resynchronization the first computer will activate a different communication interface than the one previously used so as to establish a reliable connection to the second computer. After a success resynchronization, normal “send only” message operation resumes. At predefined times, the sequence number information retained by the second computer is cleared.
-
Citations
16 Claims
-
1. A method for sending messages from a first computer to a second computer, comprising the steps of:
-
at the first computer, activating one of a plurality of communication links;
transmitting, over the activated communication link, messages from the first computer to the second computer using remote write operations to directly store each message in memory in the second computer, without performing remote read operations to confirm storage of each message in memory of the second computer;
said remote write operations by the first computer writing the messages to global addresses that have previously been mapped to physical addresses in the second computer'"'"'s memory;
assigning each message transmitted by the first computer a sequence number, and including the sequence number in the message when it is transmitted to the second computer;
at the second computer, processing each received message, storing sequence number information indicating the sequence number of each message received and processed, and using remote write operations to directly store an acknowledgment message in memory in the first computer, without performing remote read operations to confirm storage of each acknowledgment message;
said remote write operations by the second computer writing the acknowledgment messages to global addresses that have previously been mapped to physical addresses in the first computer'"'"'s memory; and
at the first computer, upon detecting a failure to receive the acknowledgment message corresponding to any of the previously sent messages, performing remedial actions to determine whether the second computer has processed the unacknowledged previously sent messages;
the remedial actions including, at the initiative of the first computer, retrieving from the second computer at least some of the sequence number information stored in the second computer and determining, using the sequence number information retrieved from the second computer, which messages, if any, sent by the first computer were not received and processed by the second computer, and re-transmitting to the second computer the messages, if any, determined not to have been received and processed by the second computer. - View Dependent Claims (2, 3, 4)
the remedial actions including, determining when the activated communication link has failed and activating a different one of the plurality of communication links. -
3. The method of claim 1,
at the first computer: -
establishing a circular ack message queue of entries for denoting messages sent to the second computer, and establishing a pointer to a current entry in the queue; and
for each message transmitted to the second computer, storing in a respective entry in the ack message queue a message status value indicating transmission of the respective message; and
at the second computer, responding to receipt of each respective message by;
processing each received message, including storing an acknowledgment message in a corresponding ack message queue entry in the first computer.
-
-
4. The method of claim 3, wherein, when the remedial actions determine that the second computer has already processed a message for which the first computer did not receive a corresponding ack message, the remedial actions including storing the acknowledgment message in the corresponding ack message queue entry in the first computer.
-
-
5. A method for sending messages from a first computer to a second computer, comprising the steps of:
-
at the first computer, activating one of a plurality of communication links;
transmitting, over the activated communication link, messages from the first computer to the second computer using remote write operations to directly store each message in memory in the second computer, without performing remote read operations to confirm storage of each message in memory of the second computer;
assigning each message transmitted by the first computer a sequence number, and including the sequence number in the message when it is transmitted to the second computer;
at the second computer, processing each received message, storing sequence number information indicating the sequence number of each message received and processed, and storing an acknowledgment message in memory in the first computer; and
at the first computer, upon detecting a failure to receive the acknowledgment message corresponding to any of the previously sent messages, performing remedial actions to determine whether the second computer has processed the unacknowledged previously sent messages;
the remedial actions including determining, at the initiative of the first computer, using the sequence number information stored in the second computer, which messages, if any, sent by the first computer were not received and processed by the second computer, and re-transmitting to the second computer the messages, if any, determined not to have been received and processed by the second computer;
the sequence number assigning step including;
determining when assignment of a next sequence number will cause a predefined sequence number wraparound condition;
upon making a wraparound condition determination, sending a message to the second computer to prompt the second computer to process all messages previously sent by the first computer to the second computer and retrieving from the second computer the sequence number information;
determining from the retrieved sequence number information which messages, if any, sent by the first computer were not received and processed by the second computer, and re-transmitting to the second computer the messages, if any, determined not to have been received and processed by the second computer; and
assigning the next sequence number to a next message. - View Dependent Claims (6)
-
-
7. In a distributed computer system, apparatus for remotely writing messages from a first computer to a second computer, comprising:
-
at the first computer;
a CPU;
a plurality of network interfaces for transmitting and receiving messages;
a message transmission procedure, for execution by the first computer'"'"'s CPU, for activating one of the plurality of network interfaces and transmitting messages from the first computer to the second computer, via the activated network interface, using remote write operations to store each message in memory in the second computer, without performing remote read operations to confirm storage of each message in the second computer'"'"'s memory;
said remote write operations by the first computer writing the messages to global addresses that have previously been mapped to physical addresses in the second computer'"'"'s memory;
a procedure for assigning a sequence number to each message transmitted by the first computer;
the message transmission procedure including instructions for including the sequence number in the message when it is transmitted to the second computer;
at the second computer;
a CPU;
a network interface for transmitting and receiving messages;
a receive message procedure, for execution by the second computer'"'"'s CPU, for processing each message received from the first computer, for storing sequence number information indicating the sequence number of each message received and processed, and for remotely writing, via the network interface, an acknowledgment message in a corresponding memory location in the first computer, without performing remote read operations to confirm storage of each acknowledgment message in the first computer'"'"'s memory;
wherein said remote writing writes the acknowledgment messages to global addresses that have previously been mapped to physical addresses in the first computer'"'"'s memory; and
at the first computer, the message transmission procedure including instructions for detecting a failure to receive the acknowledgment message corresponding to any of the previously sent messages, and for performing remedial actions to determine whether the second computer has processed the unacknowledged previously sent message;
the remedial actions including, at the initiative of the first computer, retrieving from the second computer at least some of the sequence number information stored in the second computer and determining, using the sequence number information retrieved from the second computer, which messages, if any, sent by the first computer were not received and processed by the second computer, and re-transmitting to the second computer the messages, if any, determined not to have been received and processed by the second computer.- View Dependent Claims (8)
at the second computer a set of receive buffers for receiving messages from the first computer;
a circular received message queue of entries for indicating receipt of respective messages at the first computer;
at the first computer;
a circular ack message queue of entries for denoting messages sent to the second computer, and a pointer to a current entry in the queue and a pointer to a corresponding current entry in the received message queue in the second computer;
the instructions for performing remedial actions including instructions for storing the acknowledgment message in a corresponding ack message queue entry in the first computer when the remedial actions determine that the second computer has already processed the unacknowledged previously sent message.
-
-
9. In a distributed computer system, apparatus for remotely writing messages from a first computer to a second computer, comprising:
-
at the first computer;
a CPU;
a plurality of network interfaces for transmitting and receiving messages;
a message transmission procedure, for execution by the first computer'"'"'s CPU, for activating one of the plurality of network interfaces and transmitting messages from the first computer to the second computer, via the activated network interface, using remote write operations to store each message in memory in the second computer, without performing remote read operations to confirm storage of each message in the second computer'"'"'s memory; and
a procedure for assigning a sequence number to each message transmitted by the first computer;
the message transmission procedure including instructions for including the sequence number in the message when it is transmitted to the second computer;
at the second computer;
a CPU;
a network interface for transmitting and receiving messages;
a receive message procedure, for execution by the second computer'"'"'s CPU, for processing each message received from the first computer, for storing sequence number information indicating the sequence number of each message received and processed, and for remotely writing, via the network interface, an acknowledgment message in a corresponding memory location in the first computer;
a set of receive buffers for receiving messages from the first computer; and
a received message queue of entries for indicating receipt of respective messages at the first computer;
at the first computer, an ack message queue of entries for denoting messages sent to the second computer, and a pointer to a current entry in the queue and a pointer to a corresponding current entry in the received message queue in the second computer;
the message transmission procedure including instructions for detecting a failure to receive the acknowledgment message corresponding to any of the previously sent messages, and for performing remedial actions to determine whether the second computer has processed the unacknowledged previously sent message;
the remedial actions including, at the initiative of the first computer, retrieving from the second computer at least some of the sequence number information stored in the second computer and determining, using the sequence number information retrieved from the second computer, which messages, if any, sent by the first computer were not received and processed by the second computer, and re-transmitting to the second computer the messages, if any, determined not to have been received and processed by the second computer;
the message transmission procedure including instructions for remotely writing into a respective entry in the received message queue in the second computer a message status value indicating transmission of the respective message and a sequence number; and
the receive message procedure including instructions for updating the message status value in the received message queue entry corresponding to a received message to indicate that the received message has been processed; and
the instructions for performing remedial actions including instructions for remotely reading a portion of the received message queue entry in the second computer corresponding to a previously sent message for which an acknowledgment message has not been received, the remotely read portion containing a message status value and sequence number, and determining from the remotely read message status value and sequence number what additional remedial actions to perform. - View Dependent Claims (10, 11)
the instructions for performing remedial actions further including instructions for storing the acknowledgment message in the corresponding memory location in the first computer when the remotely read status value and sequence number indicate that the second computer system received and processed the corresponding previously sent message. -
11. The apparatus of claim 10,
the instructions for performing remedial actions further including instructions for repeating the writing of the message status value and sequence number to the corresponding received message queue entry in the second computer when the remotely read status value and sequence number indicate that the message status value and sequence number were not successfully written into the corresponding received message queue entry in the second computer.
-
-
12. In a distributed computer system, apparatus for remotely writing messages from a first computer to a second computer, comprising:
-
at the first computer;
a CPU;
a plurality of network interfaces for transmitting and receiving messages;
a message transmission procedure, for execution by the first computer'"'"'s CPU, for activating one of the plurality of network interfaces and transmitting messages from the first computer to the second computer, via the activated network interface, using remote write operations to store each message in memory in the second computer, without performing remote read operations to confirm storage of each message in the second computer'"'"'s memory;
a procedure for assigning a sequence number to each message transmitted by the first computer;
the message transmission procedure including instructions for including the sequence number in the message when it is transmitted to the second computer;
at the second computer;
a CPU;
a network interface for transmitting and receiving messages;
a receive message procedure, for execution by the second computer'"'"'s CPU, for processing each message received from the first computer, for storing sequence number information indicating the sequence number of each message received and processed, and for remotely writing, via the network interface, an acknowledgment message in a corresponding memory location in the first computer;
at the first computer, the message transmission procedure including instructions for detecting a failure to receive the acknowledgment message corresponding to any of the previously sent messages, and for performing remedial actions to determine whether the second computer has processed the unacknowledged previously sent message;
the remedial actions including, at the initiative of the first computer, retrieving from the second computer at least some of the sequence number information stored in the second computer and determining, using the sequence number information retrieved from the second computer, which messages, if any, sent by the first computer were not received and processed by the second computer, and re-transmitting to the second computer the messages, if any, determined not to have been received and processed by the second computer;
wherein the sequence number assigning procedure includes instructions for;
determining when assignment of a next sequence number will cause a predefined sequence number wraparound condition;
upon making a wraparound condition determination, sending a message to the second computer to prompt the second computer to process all messages previously sent by the first computer to the second computer and retrieving from the second computer the sequence number information;
determining from the retrieved sequence number information which messages, if any, sent by the first computer were not received and processed by the second computer, and re-transmitting to the second computer the messages, if any, determined not to have been received and processed by the second computer; and
assigning the next sequence number to a next message.
-
-
13. In a distributed computer system in which a first computer remotely writes messages to a second computer, the first computer comprising:
-
at least one network interface for transmitting and receiving messages;
a CPU for executing a plurality of procedures, the procedures including;
a message transmission procedure that transmits messages from the first computer to the second computer, via the activated network interface, using remote write operations to store each message in memory in the second computer, without performing remote read operations to confirm storage of each message in the second computer'"'"'s memory; and
a sequence number assigning procedure that assigns a sequence number to each message transmitted by the first computer;
the message transmission procedure including instructions for including the sequence number in the message when it is transmitted to the second computer;
the sequence number assigning procedure including instructions for;
determining when assignment of a next sequence number will cause a predefined sequence number wraparound condition;
upon making a wraparound condition determination, sending a message to the second computer to prompt the second computer to process all messages previously sent by the first computer to the second computer and retrieving from the second computer sequence number information indicating the sequence numbers of messages received and processed by the second computer;
determining from the retrieved sequence number information which messages, if any, sent by the first computer were not received and processed by the second computer, and re-transmitting to the second computer the messages, if any, determined not to have been received and processed by the second computer; and
assigning the next sequence number to a next message. - View Dependent Claims (14)
-
-
15. A method for sending messages from a first computer to a second computer, the method performed by the first comprising the steps of:
-
transmitting messages to the second computer using remote write operations to directly store each message in memory in the second computer, without performing remote read operations to confirm storage of each message in memory of the second computer;
assigning each message a sequence number, and including the sequence number in the message when it is transmitted to the second computer;
determining when assignment of a next sequence number will cause a predefined sequence number wraparound condition;
upon making a wraparound condition determination, sending a message to the second computer to prompt the second computer to process all messages previously sent by the first computer to the second computer and retrieving from the second computer sequence number information indicating the sequence numbers of messages received and processed by the second computer;
determining from the retrieved sequence number information which messages, if any, sent by the first computer were not received and processed by the second computer, and re-transmitting to the second computer the messages, if any, determined not to have been received and processed by the second computer; and
assigning the next sequence number to a next message. - View Dependent Claims (16)
receiving from the second computer acknowledgment messages, each acknowledgment message corresponding to one of the messages transmitted by the first computer;
upon detecting a failure to receive the acknowledgment message corresponding to any of the previously sent messages, performing remedial actions to determine whether the second computer has processed the unacknowledged previously sent messages;
the remedial actions including determining, at the initiative of the first computer, using the sequence number information stored in the second computer, which messages, if any, sent by the first computer were not received and processed by the second computer, and re-transmitting to the second computer the messages, if any, determined not to have been received and processed by the second computer.
-
Specification