Massively parallel supercomputer
First Claim
1. A massively parallel computing structure comprising:
- a plurality of processing nodes interconnected by multiple independent networks, each node including one or more processing elements for performing computation or communication activity, or both, as required when performing parallel algorithm operations; and
,partitioning means for dynamically configuring one or more combinations of said independent networks according to needs of one or more algorithms, each independent network including a configurable sub-set of processing nodes interconnected by divisible portions of said multiple independent networks,said multiple independent networks comprising networks for enabling point-to-point, global tree communications and global barrier and notification operations among said nodes or independent partitioned subsets thereof, wherein combinations of said multiple independent networks interconnecting said nodes are collaboratively or independently utilized according to bandwidth and latency requirements of an algorithm for optimizing algorithm processing performance,wherein each of said dynamically configured independent processing networks is utilized to enable simultaneous collaborative processing for optimizing algorithm processing performance,wherein a first of said multiple independent networks includes an n-dimensional torus network wherein each node includes independent bi-directional nearest neighbor communication links to all adjacent processing nodes for interconnecting said nodes in a manner optimized for providing high-speed, low latency point-to-point and multicast packet communications among said nodes or independent partitioned subsets thereof in said n-dimensional torus network.
0 Assignments
0 Petitions
Accused Products
Abstract
A novel massively parallel supercomputer of hundreds of teraOPS-scale includes node architectures based upon System- On-a-Chip technology, i.e., each processing node comprises a single Application Specific Integrated Circuit (ASIC). Within each ASIC node is a plurality of processing elements each of which consists of a central processing unit (CPU) and plurality of floating point processors to enable optimal balance of computational performance, packaging density, low cost, and power and cooling requirements. The plurality of processors within a single node individually or simultaneously work on any combination of computation or communication as required by the particular algorithm being solved. The system-on-a-chip ASIC nodes are interconnected by multiple independent networks that optimally maximizes packet communications throughput and minimizes latency. The multiple networks include three high-speed networks for parallel algorithm message passing including a Torus, Global Tree, and a Global Asynchronous network that provides global barrier and notification functions.
31 Citations
29 Claims
-
1. A massively parallel computing structure comprising:
-
a plurality of processing nodes interconnected by multiple independent networks, each node including one or more processing elements for performing computation or communication activity, or both, as required when performing parallel algorithm operations; and
,partitioning means for dynamically configuring one or more combinations of said independent networks according to needs of one or more algorithms, each independent network including a configurable sub-set of processing nodes interconnected by divisible portions of said multiple independent networks, said multiple independent networks comprising networks for enabling point-to-point, global tree communications and global barrier and notification operations among said nodes or independent partitioned subsets thereof, wherein combinations of said multiple independent networks interconnecting said nodes are collaboratively or independently utilized according to bandwidth and latency requirements of an algorithm for optimizing algorithm processing performance, wherein each of said dynamically configured independent processing networks is utilized to enable simultaneous collaborative processing for optimizing algorithm processing performance, wherein a first of said multiple independent networks includes an n-dimensional torus network wherein each node includes independent bi-directional nearest neighbor communication links to all adjacent processing nodes for interconnecting said nodes in a manner optimized for providing high-speed, low latency point-to-point and multicast packet communications among said nodes or independent partitioned subsets thereof in said n-dimensional torus network. - View Dependent Claims (2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22)
-
-
23. A scalable, massively parallel computing structure comprising:
-
a plurality of processing nodes interconnected by independent networks, each processing node including two or more processing elements each capable of individually or simultaneously performing any combination of computation or communication activity, or both, as required when performing parallel algorithm operations; and
,a first independent network comprising an n-dimensional torus network wherein each processing node includes independent bi-directional nearest neighbor communication links to all adjacent processing nodes for interconnecting said nodes in a manner optimized for providing high-speed, low latency point-to-point and multicast packet communications among said nodes or sub-sets of nodes of said n-dimensional torus network; a second of said multiple independent networks includes a scalable global tree network comprising nodal interconnections that facilitate simultaneous global operations among nodes or sub-sets of nodes of said n-dimensional torus network; and
,partitioning means for dynamically configuring one or more combinations of independent processing networks according to needs of one or more algorithms, each independent network including a configurable sub-set of processing nodes interconnected by divisible portions of said first and second networks, wherein each of said configured independent processing networks is utilized to enable simultaneous collaborative processing for optimizing algorithm processing performance. - View Dependent Claims (24, 25)
-
-
26. A scalable, massively parallel computing system comprising:
-
a plurality of processing nodes interconnected by links to form an n-dimensional torus network, each processing node being connected by a plurality of links including independent bi-directional nearest neighbor communication links to all adjacent processing nodes in a manner optimized for providing high-speed, low latency point-to-point and multicast packet communications among said nodes or sub-sets of nodes of said n-dimensional torus network; communication links for further interconnecting said processing nodes to form a global combining tree network, and a global interrupt and barrier tree network for communicating global signals including interrupt signals; partitioning means for dynamically configuring one or more combinations of said n-dimensional torus network and global combining tree network according to needs of one or more algorithms, each n-dimensional torus network and global combining tree network including a configurable sub-set of processing nodes interconnected by divisible portions of said torus network and global combining tree networks, and link means for receiving signals from said torus and global tree networks, and said global interrupt signals, for redirecting said signals between different ports of the link means to enable the computing system to be partitioned into multiple, logically separate computing systems, each separate computing system including a configurable sub-set of processing nodes interconnected by divisible portions of said n-dimensional torus and said global combining tree networks. - View Dependent Claims (27, 28)
-
-
29. A massively parallel computing system comprising:
-
a plurality of processing nodes interconnected by multiple independent networks, each processing node comprising a system-on-chip Application Specific Integrated Circuit (ASIC) comprising two or more processing elements each capable of performing computation or message passing operations; a first independent network comprising an n-dimensional torus network wherein each processing node includes independent bi-directional nearest neighbor communication links to all adjacent processing nodes for interconnecting said nodes in a manner optimized for providing high-speed, low latency point-to-point and multicast packet communications among said nodes or sub-sets of nodes of said n-dimensional torus network; a second of said multiple independent networks includes a scalable global tree network comprising nodal interconnections that facilitate simultaneous global operations among nodes or sub-sets of nodes of said network; and
,partitioning means for dynamically configuring one or more combinations of independent processing networks according to needs of one or more algorithms, each independent network including a configured sub-set of processing nodes interconnected by divisible portions of said first and second networks, and, means enabling rapid coordination of processing and message passing activity at each said processing element in each independent processing network, wherein one, or both, of the processing elements performs calculations needed by the algorithm, while the other, or both, of the processing elements performs message passing activities for communicating with other nodes of said network, as required when performing particular classes of algorithms, wherein each of said configured independent processing networks and node processing elements thereof are dynamically utilized to enable collaborative processing for optimizing algorithm processing performance.
-
Specification