Massively parallel supercomputer

US 8,250,133 B2
Filed: 06/26/2009
Issued: 08/21/2012
Est. Priority Date: 02/24/2001
Status: Expired due to Fees

First Claim

Patent Images

1. A massively parallel computing structure comprising:

a plurality of processing nodes interconnected by multiple independent networks, each node including one or more processing elements for performing computation or communication activity, or both, as required when performing parallel algorithm operations; and

,partitioning means for dynamically configuring one or more combinations of said independent networks according to needs of one or more algorithms, each independent network including a configurable sub-set of processing nodes interconnected by divisible portions of said multiple independent networks,said multiple independent networks comprising networks for enabling point-to-point, global tree communications and global barrier and notification operations among said nodes or independent partitioned subsets thereof, wherein combinations of said multiple independent networks interconnecting said nodes are collaboratively or independently utilized according to bandwidth and latency requirements of an algorithm for optimizing algorithm processing performance,wherein each of said dynamically configured independent processing networks is utilized to enable simultaneous collaborative processing for optimizing algorithm processing performance,wherein a first of said multiple independent networks includes an n-dimensional torus network wherein each node includes independent bi-directional nearest neighbor communication links to all adjacent processing nodes for interconnecting said nodes in a manner optimized for providing high-speed, low latency point-to-point and multicast packet communications among said nodes or independent partitioned subsets thereof in said n-dimensional torus network.

View all claims

0 Assignments

Timeline View

Assignment View

0 Petitions

Accused Products

Abstract

A novel massively parallel supercomputer of hundreds of teraOPS-scale includes node architectures based upon System- On-a-Chip technology, i.e., each processing node comprises a single Application Specific Integrated Circuit (ASIC). Within each ASIC node is a plurality of processing elements each of which consists of a central processing unit (CPU) and plurality of floating point processors to enable optimal balance of computational performance, packaging density, low cost, and power and cooling requirements. The plurality of processors within a single node individually or simultaneously work on any combination of computation or communication as required by the particular algorithm being solved. The system-on-a-chip ASIC nodes are interconnected by multiple independent networks that optimally maximizes packet communications throughput and minimizes latency. The multiple networks include three high-speed networks for parallel algorithm message passing including a Torus, Global Tree, and a Global Asynchronous network that provides global barrier and notification functions.

31 Citations

View as Search Results

29 Claims

1. A massively parallel computing structure comprising:
- a plurality of processing nodes interconnected by multiple independent networks, each node including one or more processing elements for performing computation or communication activity, or both, as required when performing parallel algorithm operations; and
  
  ,partitioning means for dynamically configuring one or more combinations of said independent networks according to needs of one or more algorithms, each independent network including a configurable sub-set of processing nodes interconnected by divisible portions of said multiple independent networks,said multiple independent networks comprising networks for enabling point-to-point, global tree communications and global barrier and notification operations among said nodes or independent partitioned subsets thereof, wherein combinations of said multiple independent networks interconnecting said nodes are collaboratively or independently utilized according to bandwidth and latency requirements of an algorithm for optimizing algorithm processing performance,wherein each of said dynamically configured independent processing networks is utilized to enable simultaneous collaborative processing for optimizing algorithm processing performance,wherein a first of said multiple independent networks includes an n-dimensional torus network wherein each node includes independent bi-directional nearest neighbor communication links to all adjacent processing nodes for interconnecting said nodes in a manner optimized for providing high-speed, low latency point-to-point and multicast packet communications among said nodes or independent partitioned subsets thereof in said n-dimensional torus network.
- View Dependent Claims (2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22)
- - 2. The massively parallel computing structure as claimed in claim 1, wherein a second of said multiple independent networks includes a scalable global tree network comprising nodal interconnections that facilitate simultaneous global operations among nodes or sub-sets of nodes of said network.
  - 3. The massively parallel computing structure as claimed in claim 2, wherein said global operations include global broadcast operations initiated at any node of said tree for downstream broadcast from a root node to leaf nodes of said tree network or sub-tree network thereof, and global reduction operations upstream from nodes toward said root node in each tree or sub-tree network.
  - 4. The massively parallel computing structure as claimed in claim 2, wherein said root node of a plurality of tree or sub-tree networks couples with an I/O node for performing high-speed I/O operations for that tree network independent of processing performed in other networks.
  - 5. The massively parallel computing structure as claimed in claim 4, further including programmable means enabling point-to-point and sub-tree messaging among nodes of each said global tree network, each node having a unique address associated therewith to enable a host system to directly communicate to every node.
  - 6. The massively parallel computing structure as claimed in claim 5, wherein said unique address associated includes an encoded geometric location of the node according to its rack, midplane and node-card position in the computing structure.
  - 7. The massively parallel computing structure as claimed in claim 2, wherein a ratio of a service node to sub-set of nodes is configurable to enable optimized packaging and utilization of said computing structure.
  - 8. The massively parallel computing structure as claimed in claim 1, wherein each node includes two or more processing elements each capable of individually or simultaneously working on any combination of computation or communication activity as required when performing particular classes of parallel algorithms.
  - 9. The massively parallel computing structure as claimed in claim 8, further including means for enabling rapid shifting of computation or communication activities between each of said processing elements.
  - 10. The massively parallel computing structure as claimed in claim 9, wherein each processing element includes a central processing unit (CPU) and one or more floating point processing units, said node further comprising a local embedded multi-level cache memory and a programmable prefetch engine incorporated into a lower level cache for prefetching data for a higher level cache.
  - 11. The massively parallel computing structure as claimed in claim 1, wherein each node comprises a system-on-chip Application Specific Integrated Circuit (ASIC) enabling high packaging density and decreasing power utilization and cooling requirements.
  - 12. The massively parallel computing structure as claimed in claim 11, wherein each node ASIC further comprises a shared resource in a memory accessible by said processing units configured for lock exchanges to prevent bottlenecks in said processing units.
  - 13. The massively parallel computing structure as claimed in claim 1, wherein said computing structure comprises a predetermined plurality of ASIC nodes packaged on a circuit card, a plurality of circuit cards being configured on an indivisible midplane unit packaged within said computing structure.
  - 14. The massively parallel computing structure as claimed in claim 13, wherein a circuit card is organized to comprise nodes logically connected as a sub-cube or a rectangle in said n-dimensional torus network.
  - 15. The massively parallel computing structure as claimed in claim 13, further including means for partitioning sub-sets of nodes according to various logical network configurations for enabling independent processing among said nodes according to bandwidth and latency requirements of a parallel algorithm being processed.
  - 16. The massively parallel computing structure as claimed in claim 15, said partitioning means includes link devices for redriving signals over conductors interconnecting different mid-planes and, redirecting signals between different ports for enabling the supercomputing system to be partitioned into multiple, logically separate systems.
  - 17. The massively parallel computing structure as claimed in claim 16, further including means for programming said link devices for mapping communication and computing activities around any midplanes determined as being faulty for servicing thereof without interfering with the remaining system operations.
  - 18. The massively parallel computing structure as claimed in claim 16, wherein one of said multiple independent networks includes an independent control network for controlling said link chips to program said partitioning.
  - 19. The massively parallel computing structure as claimed in claim 13, further comprising a clock distribution system for providing clock signals to every circuit card of a midplane unit at minimum jitter.
  - 20. The massively parallel computing structure as claimed in claim 19, wherein said clock distribution system utilizes tunable redrive signals for enabling in phase clock distribution to all nodes of said computing structure and networked partitions thereof.
  - 21. The massively parallel computing structure as claimed in claim 1, wherein said independent bi-directional nearest neighbor communication links further include high-speed, bi-directional serial links interconnecting said processing nodes for carrying signals in both directions at the same time.
  - 22. The massively parallel computing structure as claimed in claim 21, further implementing means for capturing data sent over said links that permits optimal sampling and capture of an asynchronous data stream without sending a clock signal with the data stream.

23. A scalable, massively parallel computing structure comprising:
- a plurality of processing nodes interconnected by independent networks, each processing node including two or more processing elements each capable of individually or simultaneously performing any combination of computation or communication activity, or both, as required when performing parallel algorithm operations; and
  
  ,a first independent network comprising an n-dimensional torus network wherein each processing node includes independent bi-directional nearest neighbor communication links to all adjacent processing nodes for interconnecting said nodes in a manner optimized for providing high-speed, low latency point-to-point and multicast packet communications among said nodes or sub-sets of nodes of said n-dimensional torus network;
  
  a second of said multiple independent networks includes a scalable global tree network comprising nodal interconnections that facilitate simultaneous global operations among nodes or sub-sets of nodes of said n-dimensional torus network; and
  
  ,partitioning means for dynamically configuring one or more combinations of independent processing networks according to needs of one or more algorithms, each independent network including a configurable sub-set of processing nodes interconnected by divisible portions of said first and second networks,wherein each of said configured independent processing networks is utilized to enable simultaneous collaborative processing for optimizing algorithm processing performance.
- View Dependent Claims (24, 25)
- - 24. The scalable, massively parallel computing structure as claimed in claim 23, wherein each node comprises a system-on-chip Application Specific Integrated Circuit (ASIC) comprising said two processing elements.
  - 25. The scalable, massively parallel computing structure as claimed in claim 23, further including means for enabling switching of processing among one or more configured independent processing networks when performing particular classes of algorithms.

26. A scalable, massively parallel computing system comprising:
- a plurality of processing nodes interconnected by links to form an n-dimensional torus network, each processing node being connected by a plurality of links including independent bi-directional nearest neighbor communication links to all adjacent processing nodes in a manner optimized for providing high-speed, low latency point-to-point and multicast packet communications among said nodes or sub-sets of nodes of said n-dimensional torus network;
  
  communication links for further interconnecting said processing nodes to form a global combining tree network, and a global interrupt and barrier tree network for communicating global signals including interrupt signals;
  
  partitioning means for dynamically configuring one or more combinations of said n-dimensional torus network and global combining tree network according to needs of one or more algorithms, each n-dimensional torus network and global combining tree network including a configurable sub-set of processing nodes interconnected by divisible portions of said torus network and global combining tree networks, andlink means for receiving signals from said torus and global tree networks, and said global interrupt signals, for redirecting said signals between different ports of the link means to enable the computing system to be partitioned into multiple, logically separate computing systems, each separate computing system including a configurable sub-set of processing nodes interconnected by divisible portions of said n-dimensional torus and said global combining tree networks.
- View Dependent Claims (27, 28)
- - 27. The massively parallel computing system as claimed in claim 26, wherein the link means provides a function of redriving signals over cables between midplane devices that include a plurality of processing nodes, to improve the high speed shape and amplitude of the signals.
  - 28. The massively parallel computing system as claimed in claim 26, wherein the link means performs a first type of signal redirection for removing one midplane from one logical direction along a defined axis of the computing system, and a second type of redirection that permits dividing the computing system into two halves or four quarters.

29. A massively parallel computing system comprising:
- a plurality of processing nodes interconnected by multiple independent networks, each processing node comprising a system-on-chip Application Specific Integrated Circuit (ASIC) comprising two or more processing elements each capable of performing computation or message passing operations;
  
  a first independent network comprising an n-dimensional torus network wherein each processing node includes independent bi-directional nearest neighbor communication links to all adjacent processing nodes for interconnecting said nodes in a manner optimized for providing high-speed, low latency point-to-point and multicast packet communications among said nodes or sub-sets of nodes of said n-dimensional torus network;
  
  a second of said multiple independent networks includes a scalable global tree network comprising nodal interconnections that facilitate simultaneous global operations among nodes or sub-sets of nodes of said network; and
  
  ,partitioning means for dynamically configuring one or more combinations of independent processing networks according to needs of one or more algorithms, each independent network including a configured sub-set of processing nodes interconnected by divisible portions of said first and second networks,and,means enabling rapid coordination of processing and message passing activity at each said processing element in each independent processing network, wherein one, or both, of the processing elements performs calculations needed by the algorithm, while the other, or both, of the processing elements performs message passing activities for communicating with other nodes of said network, as required when performing particular classes of algorithms,wherein each of said configured independent processing networks and node processing elements thereof are dynamically utilized to enable collaborative processing for optimizing algorithm processing performance.

Specification

Resources

Litigation Campaign Assessment

Current Assignee
International Business Machines Corporation
Original Assignee
International Business Machines Corporation
Inventors
Blumrich, Matthias A., Chen, Dong, Chiu, George L., Cipolla, Thomas M., Coteus, Paul W., Gara, Alan G., Giampapa, Mark E., Heidelberger, Philip, Kopcsay, Gerard V., Mok, Lawrence S., Takken, Todd E.
Primary Examiner(s)
Etienne, Ario
Assistant Examiner(s)
Williams, Clayton R

Application Number

US12/492,799
Publication Number

US 20090259713A1
Time in Patent Office

1,152 Days
Field of Search

None
US Class Current

709/201
CPC Class Codes

F04D 25/166   using fans

F04D 27/004   by varying driving speed

F24F 11/77   by controlling the speed of...

G06F 15/17381   Two dimensional, e.g. mesh,...

G06F 17/142   Fast Fourier transforms, e....

G06F 9/52   Program synchronisation; Mu...

G06F 9/526   Mutual exclusion algorithms

G09G 5/008   Clock recovery

H04L 7/0338   the correction of the phase...

H05K 7/20836   Thermal management, e.g. se...

Y02B 30/70   Efficient control or regula...

Massively parallel supercomputer

First Claim

0 Assignments

0 Petitions

Accused Products

Abstract

31 Citations

29 Claims

Specification

Solutions

Use Cases

Quick Links

Massively parallel supercomputer

First Claim

0 Assignments

Subscription Required

Subscription Required

0 Petitions

Subscription Required

Accused Products

Subscription Required

Abstract

31 Citations

29 Claims

Specification

Subscription Required

Solutions

Use Cases

Quick Links