Providing improved message handling performance in computer systems utilizing shared network devices

US 8,166,146 B2
Filed: 09/29/2008
Issued: 04/24/2012
Est. Priority Date: 09/29/2008
Status: Expired due to Fees

First Claim

Patent Images

1. A parallel computer system, comprising:

an input/output (I/O) node;

a plurality of compute nodes coupled to each other and to the I/O node via a collective network, each compute node comprising;

a compute logic block having a plurality of processors, wherein one of the processors runs a first thread, and wherein another of the processors runs a main process of an application program that spawned the first thread to run on said one of the processors;

a memory array block shared by the processors;

a network logic block having one or more communication blocks, wherein at least one of the communication blocks comprises a collective network device for facilitating communication of messages between the compute node and the I/O node, each message comprising a plurality of packets, wherein the collective network has a point-to-point mode that allows messages to be sent to a specific node in the collective network, wherein when sending a message to one of the compute nodes from the I/O node, all of the packets in the message are sent together so a complete message with the packets in order is delivered to the compute node, and wherein each of the messages has a one packet header that includes a thread ID identifying a thread to which the message is to be delivered;

wherein when receiving a message at the compute node, the compute node performs the steps of;

(a) obtaining a lock on the network device;

(b) checking a shared storage location of the memory array block for a one packet header containing a thread ID identifying the first thread to see if a message is pending for the first thread;

(c) if a message is pending for the first thread based on the checking step (b), receiving the remaining packets in the message directly to a user'"'"'s buffer, unlocking the network device, and returning;

(d) if no message is pending for the first thread based on the checking step (b), receiving a one packet header of a message from the network device;

(e) if the one packet header received in step (d) indicates that the message is for the first thread, receiving the remaining packets in the message directly to the user'"'"'s buffer, unlocking the network device, and returning;

(f) if the one packet header received in step (d) indicates that the message is for a thread other than the first thread, updating the shared storage location of the memory array block with a thread ID of the other thread, unlocking the network device, waiting for a time out to expire, obtaining a lock on the network device, and repeating from the checking step (b);

wherein an I/O node daemon runs on the I/O node, wherein a compute node kernel (CNK) runs on each of the processors, and wherein the compute node operates in at least one of a symmetric multi-processor (SMP) mode and a dual mode, comprising;

when the compute node operates in the SMP mode, a first one of the processors runs a main process of an application program in the SMP mode, wherein the first thread is spawned to run on a second one of the processors by the application program'"'"'s main process running on the first processor, wherein steps (a)-(f) are performed by the CNK running on the second processor, and wherein a third one of the processors runs a second thread spawned by the application program'"'"'s main process running on the first processor;

when the compute node operates in the dual mode, a first one and a second one of the processors each runs a main process of an application program in the dual mode, wherein the first thread is spawned to run on a third one of the processors by the application program'"'"'s main process running on the first processor wherein steps (a)-(f) are performed by the CNK running on the third processor, and wherein a fourth one of the processors runs a second thread spawned by the application program'"'"'s main process running on the second processor.

View all claims

1 Assignment

Timeline View

Assignment View

0 Petitions

Accused Products

Abstract

In a massively parallel computer system embodiment, when receiving a message at a compute node from an input/output node, the compute node performs the steps of: obtaining a lock on a collective network device; checking a shared storage location for a message pending for a thread; if such a message is pending, receiving the message'"'"'s remaining packets directly to a user'"'"'s buffer, unlocking, and returning; if no such message is pending, receiving one packet from the network device; if the packet indicates that the message is for the thread, receiving the message'"'"'s remaining packets directly to the user'"'"'s buffer, unlocking, and returning; and if the packet indicates that the message is for another thread, updating the shared storage location with a thread id of the other thread, unlocking, waiting for a time out, locking, and repeating from the checking step. Accordingly, data copying is eliminated with an attendant performance benefit.

23 Citations

5 Claims

1. A parallel computer system, comprising:
- an input/output (I/O) node;
  
  a plurality of compute nodes coupled to each other and to the I/O node via a collective network, each compute node comprising;
  
  a compute logic block having a plurality of processors, wherein one of the processors runs a first thread, and wherein another of the processors runs a main process of an application program that spawned the first thread to run on said one of the processors;
  
  a memory array block shared by the processors;
  
  a network logic block having one or more communication blocks, wherein at least one of the communication blocks comprises a collective network device for facilitating communication of messages between the compute node and the I/O node, each message comprising a plurality of packets, wherein the collective network has a point-to-point mode that allows messages to be sent to a specific node in the collective network, wherein when sending a message to one of the compute nodes from the I/O node, all of the packets in the message are sent together so a complete message with the packets in order is delivered to the compute node, and wherein each of the messages has a one packet header that includes a thread ID identifying a thread to which the message is to be delivered;
  
  wherein when receiving a message at the compute node, the compute node performs the steps of;
  
  (a) obtaining a lock on the network device;
  
  (b) checking a shared storage location of the memory array block for a one packet header containing a thread ID identifying the first thread to see if a message is pending for the first thread;
  
  (c) if a message is pending for the first thread based on the checking step (b), receiving the remaining packets in the message directly to a user'"'"'s buffer, unlocking the network device, and returning;
  
  (d) if no message is pending for the first thread based on the checking step (b), receiving a one packet header of a message from the network device;
  
  (e) if the one packet header received in step (d) indicates that the message is for the first thread, receiving the remaining packets in the message directly to the user'"'"'s buffer, unlocking the network device, and returning;
  
  (f) if the one packet header received in step (d) indicates that the message is for a thread other than the first thread, updating the shared storage location of the memory array block with a thread ID of the other thread, unlocking the network device, waiting for a time out to expire, obtaining a lock on the network device, and repeating from the checking step (b);
  
  wherein an I/O node daemon runs on the I/O node, wherein a compute node kernel (CNK) runs on each of the processors, and wherein the compute node operates in at least one of a symmetric multi-processor (SMP) mode and a dual mode, comprising;
  
  when the compute node operates in the SMP mode, a first one of the processors runs a main process of an application program in the SMP mode, wherein the first thread is spawned to run on a second one of the processors by the application program'"'"'s main process running on the first processor, wherein steps (a)-(f) are performed by the CNK running on the second processor, and wherein a third one of the processors runs a second thread spawned by the application program'"'"'s main process running on the first processor;
  
  when the compute node operates in the dual mode, a first one and a second one of the processors each runs a main process of an application program in the dual mode, wherein the first thread is spawned to run on a third one of the processors by the application program'"'"'s main process running on the first processor wherein steps (a)-(f) are performed by the CNK running on the third processor, and wherein a fourth one of the processors runs a second thread spawned by the application program'"'"'s main process running on the second processor.
- View Dependent Claims (2)
- - 2. The parallel computer system as recited in claim 1, wherein the network logic block includes a JTAG communication block, a torus communication block, a tree communication block, a barrier communication block and an Ethernet communication block, and wherein the network device comprises the tree communication block.

3. A computer-implemented method for providing improved message handling performance in a parallel computer system utilizing a shared network device, wherein the parallel computer system comprises an input/output (I/O) node and a plurality of compute nodes coupled to each other and to the I/O node via a collective network, each compute node comprises:
- a compute logic block having a plurality of processors, wherein one of the processors runs a first thread, and wherein another of the processors runs a main process of an application program that spawned the first thread to run on said one of the processors;
  
  a memory array block shared by the processors;
  
  a network logic block having a collective network device for facilitating communication of messages between the compute node and the I/O node, each message comprising a plurality of packets, wherein the collective network has a point-to-point mode that allows messages to be sent to a specific node in the collective network, wherein when sending a message to one of the compute nodes from the I/O node, all of the packets in the message are sent together so a complete message with the packets in order is delivered to the compute node, and wherein each of the messages has a one packet header that includes a thread ID identifying a thread to which the message is to be delivered;
  
  wherein when receiving a message at the compute node, the compute node performs the computer-implemented method comprising the steps of;
  
  (a) obtaining a lock on the network device;
  
  (b) checking a shared storage location of the memory array block for a one packet header containing a thread ID identifying the first thread to see if a message is pending for the first process;
  
  (c) if a message is pending for the first thread based on the checking step (b), receiving the remaining packets in the message directly to a user'"'"'s buffer, unlocking the network device, and returning;
  
  (d) if no message is pending for the first thread based on the checking step (b), receiving a one packet header of a message from the network device;
  
  (e) if the one packet header received in step (d) indicates that the message is for the first thread, receiving the remaining packets in the message directly to the user'"'"'s buffer, unlocking the network device, and returning;
  
  (f) if the one packet header received in step (d) indicates that the message is for a thread other than the first thread, updating the shared storage location of the memory array block with a thread ID of the other thread, unlocking the network device, waiting for a time out to expire, obtaining a lock on the network device, and repeating from the checking step (b);
  
  wherein an I/O node daemon runs on the I/O node, wherein a compute node kernel (CNK) runs on each of the processors, and wherein the compute node operates in at least one of a symmetric multi-processor (SMP) mode and a dual mode, comprising;
  
  when the compute node operates in the SMP mode, a first one of the processors runs a main process of an application program in the SMP mode, wherein the first thread is spawned to run on a second one of the processors by the application program'"'"'s main process running on the first processor, wherein steps (a)-(f) are performed by the CNK running on the second processor, and wherein a third one of the processors runs a second thread spawned by the application program'"'"'s main process running on the first processor;
  
  when the compute node operates in the dual mode, a first one and a second one of the processors each runs a main process of an application program in the dual mode, wherein the first thread is spawned to run on a third one of the processors by the application program'"'"'s main process running on the first processor, wherein steps (a)-(f) are performed by the CNK running on the third processor, and wherein a fourth one of the processors runs a second thread spawned by the application program'"'"'s main process running on the second processor.

4. A non-transitory computer readable medium for providing improved message handling performance in a parallel computer system utilizing a shared network device, wherein the parallel computer system comprises an input/output (I/O) node and a plurality of compute nodes coupled to each other and to the I/O node via a collective network, each compute node comprises:
- a compute logic block having a plurality of processors, wherein one of the processors runs a first thread, and wherein another of the processors runs a main process of an application program that spawned the first thread to run on said one of the processors;
  
  a memory array block shared by the processors; and
  
  a network logic block having a collective network device for facilitating communication of messages between the compute node and the I/O node, each message comprising a plurality of packets, wherein the collective network has a point-to-point mode that allows messages to be sent to a specific node in the collective network, wherein when sending a message to one of the compute nodes from the I/O node, all of the packets in the message are sent together so a complete message with the packets in order is delivered to the compute node, and wherein each of the messages has a one packet header that includes a thread ID identifying a thread to which the message is to be delivered;
  
  the non-transitory computer readable medium comprising a recordable media having instructions recorded thereon that when executed by one or more of the processors of the compute node cause the compute node when receiving a message to perform the steps of;
  
  (a) obtaining a lock on the network device;
  
  (b) checking a shared storage location of the memory array block for a one packet header containing a thread ID identifying the first thread to see if a message is pending for the first thread;
  
  (c) if a message is pending for the first thread based on the checking step (b), receiving the remaining packets in the message directly to a user'"'"'s buffer, unlocking the network device, and returning;
  
  (d) if no message is pending for the first thread based on the checking step (b), receiving a one packet header of a message from the network device;
  
  (e) if the one packet header received in step (d) indicates that the message is for the first thread, receiving the remaining packets in the message directly to the user'"'"'s buffer, unlocking the network device, and returning;
  
  (f) if the one packet header received in step (d) indicates that the message is for a thread other than the first thread, updating the shared storage location of the memory array block with a thread ID of the other thread, unlocking the network device, waiting for a time out to expire, obtaining a lock on the network device, and repeating from the checking step (b);
  
  wherein an I/O node daemon runs on the I/O node, wherein a compute node kernel (CNK) runs on each of the processors, and wherein the compute node operates in at least one of a symmetric multi-processor (SMP) mode and a dual mode, comprising;
  
  when the compute, node operates in the SMP mode, a first one of the processors runs a main process of an application program in the SMP mode, wherein the first thread is spawned to run on a second one of the processors by the application program'"'"'s main process running on the first processor, wherein steps (a)-(f) are performed by the CNK running on the second processor, and wherein a third one of the processors runs a second thread spawned by the application program'"'"'s main process running on the first processor;
  
  when the compute node operates in the dual mode, a first one and a second one of the processors each runs a main process of an application program in the dual mode, wherein the first thread is spawned to run on a third one of the processors by the application program'"'"'s main process running on the first processor, wherein steps (a)-(f) are performed by the CNK running on the third processor, and wherein a fourth one of the processors runs a second thread spawned by the application program'"'"'s main process running on the second processor.

5. A computer-implemented method for providing improved message handling performance in a distributed computer system utilizing a shared network device, wherein the parallel computer system comprises a control system and a plurality of compute nodes coupled to the control system via a network, each compute node comprises:
- a compute logic block having a plurality of processors, wherein one of the processors runs a first thread, and wherein another of the processors runs a main process of an application program that spawned the first thread to run on said one of the processors;
  
  a memory array block shared by the processors;
  
  a network logic block having a network device for facilitating communication of messages between the compute node and the control system, each message comprising a plurality of packets, wherein the collective network has a point-to-point mode that allows messages to be sent to a specific node in the collective network, wherein when sending a message to one of the compute nodes from the I/O node, all of the packets in the message are sent together so a complete message with the packets in order is delivered to the compute node, and wherein each of the messages has a one packet header that includes a thread ID identifying a thread to which the message is to be delivered;
  
  wherein when receiving a message at the compute node from the control system, the compute node performs the computer-implemented method comprising the steps of;
  
  (a) obtaining a lock on the network device;
  
  (b) checking a shared storage location of the memory array block for a one packet header containing a thread ID identifying the first thread to see if a message is pending for the first thread;
  
  (c) if a message is pending for the first thread based on the checking step (b), receiving the remaining packets in the message directly to a user'"'"'s buffer, unlocking the network device, and returning;
  
  (d) if no message is pending for the first thread based on the checking step (b), receiving a one packet header of a message from the network device;
  
  (e) if the one packet header received in step (d) indicates that the message is for the first thread, receiving the remaining packets in the message directly to the user'"'"'s buffer, unlocking the network device, and returning;
  
  (f) if the one packet header received in step (d) indicates that the message is for a thread other than the first thread, updating the shared storage location of the memory array block with a thread ID of the other thread, unlocking the network device, waiting for a time out to expire, obtaining a lock on the network device, and repeating from the checking step (b);
  
  wherein a compute node kernel (CNK) runs on each of the processors, and wherein the compute node operates in at least one of a symmetric multi-processor (SMP) mode and a dual mode, comprising;
  
  when the compute node operates in the SMP mode, a first one of the processors runs a main process of an application program in the SMP mode, wherein the first thread is spawned to run on a second one of the processors by the application program'"'"'s main process running on the first processor, wherein steps (a)-(f) are performed by the CNK running on the second processor, and wherein a third one of the processors runs a second thread spawned by the application program'"'"'s main process running on the first processor;
  
  when the compute node operates in the dual mode, a first one and a second one of the processors each runs a main process of an application program in the dual mode, wherein the first thread is spawned to run on a third one of the processors by the application program'"'"'s main process running on the first processor, wherein steps (a)-(f) are performed by the CNK running on the third processor, and wherein a fourth one of the processors runs a second thread spawned by the application program'"'"'s main process running on the second processor.

Specification

Resources

Litigation Campaign Assessment

Current Assignee
International Business Machines Corporation
Original Assignee
International Business Machines Corporation
Inventors
Mundy, Michael Basil
Primary Examiner(s)
Follansbee, John
Assistant Examiner(s)
Patel, Dhairya A

Application Number

US12/239,966
Publication Number

US 20100082788A1
Time in Patent Office

1,303 Days
Field of Search

714/6, 714/15, 714/20, 709/201, 709/202, 709/203, 709/213, 709/223, 709/224, 709/225
US Class Current

709/223
CPC Class Codes

G06F 9/52 Program synchronisation; Mu...

G06F 9/544 Buffers; Shared memory; Pipes

Providing improved message handling performance in computer systems utilizing shared network devices

First Claim

1 Assignment

0 Petitions

Accused Products

Abstract

23 Citations

5 Claims

Specification

Use Cases

Quick Links

Others

Providing improved message handling performance in computer systems utilizing shared network devices

First Claim

1 Assignment

Subscription Required

Subscription Required

0 Petitions

Subscription Required

Accused Products

Subscription Required

Abstract

23 Citations

5 Claims

Specification

Subscription Required

Use Cases

Quick Links

Others