MASSIVELY PARALLEL COMPUTER, ACCELERATED COMPUTING CLUSTERS, AND TWO-DIMENSIONAL ROUTER AND INTERCONNECTION NETWORK FOR FIELD PROGRAMMABLE GATE ARRAYS, AND APPLICATIONS
0 Assignments
0 Petitions
Accused Products
Abstract
An embodiment of a massively parallel computing system comprising a plurality of processors, which may be subarranged into clusters of processors, and interconnected by means of a configurable directional 2D router for Networks on Chips (NOCs) is disclosed. The system further comprises diverse high bandwidth external I/O devices and interfaces, which may include without limitation Ethernet interfaces, and dynamic RAM (DRAM) memories. The system is designed for implementation in programmable logic in FPGAs, but may also be implemented in other integrated circuit technologies, such as non-programmable circuitry, and in integrated circuits such as application-specific integrated circuits (ASICs). The system enables the practical implementation of diverse FPGA computing accelerators to speed up computation for example in data centers or telecom networking infrastructure. The system uses the NOC to interconnect processors, clusters, accelerators, and/or external interfaces. A great diversity of NOC client cores, for communication amongst various external interfaces and devices, and on-chip interfaces and resources, may be coupled to a router in order to efficiently communicate with other NOC client cores. The system, router, and NOC enable feasible FPGA implementation of large integrated systems on chips, interconnecting hundreds of client cores over high bandwidth links, including compute and accelerator cores, industry standard IP cores, DRAM/HBM/HMC channels, PCI Express channels, and 10G/25G/40G/100G/400G networks.
139 Citations
136 Claims
-
1-78. -78. (canceled)
-
79. An integrated circuit, comprising:
-
cluster circuits; a first one of the cluster circuits including a first cluster-input bus, a first cluster-output bus, a first computing circuit, and a first interface circuit coupled to the computing circuit, the cluster-input bus, and the cluster-output bus, and configured to receive, from the computing circuit, a request to send a message that includes payload data, to generate, in response to the request, an outgoing message that includes a destination indicator and the payload data, and to cause the outgoing message to be provided on the cluster-output bus; and a first interconnection network including routers each coupled to a respective one of the cluster circuits, and a first one of the routers coupled to the first one of the cluster circuits and including a first routing circuit configured to provide the outgoing message to a second one of the cluster circuits corresponding to the destination indicator. - View Dependent Claims (80, 83, 85, 86, 87, 89, 91, 92, 93, 97, 98, 100, 101, 103, 104)
-
-
81-82. -82. (canceled)
-
84. (canceled)
-
88. (canceled)
-
90. (canceled)
-
94-96. -96. (canceled)
-
99. (canceled)
-
102. (canceled)
-
105-107. -107. (canceled)
-
108. A non-transitory computer-readable medium storing configuration data that, when received by a field-programmable gate array, causes the field-programmable gate array to instantiate:
-
cluster circuits; a first one of the cluster circuits including a first cluster-input bus, a first cluster-output bus, a first computing circuit, and a first interface circuit coupled to the computing circuit, the cluster-input bus, and the cluster-output bus, and configured to receive, from the computing circuit, a request to send a message that includes payload data, to generate, in response to the request, an outgoing message that includes a destination indicator and the payload data, and to cause the outgoing message to be provided on the cluster-output bus; and a first interconnection network including routers each coupled to a respective one of the cluster circuits, and a first one of the routers coupled to the first one of the cluster circuits and including a first routing circuit configured to provide the outgoing message to a second one of the cluster circuits corresponding to the destination indicator.
-
-
109. A method, comprising:
-
generating intermediate data with a first computing circuit of a first cluster circuit on an integrated circuit, the first computing circuit including one or more first processors each including a respective first instruction-executing computing core or a respective first configurable accelerator, together the one or more first processors including multiple first instruction-executing computing cores or at least one first configurable accelerator; sending the intermediate data from the first cluster circuit to a second cluster circuit on the integrated circuit via an interconnection network on the integrated circuit; and generating, in response to the intermediate data, first output data with a second computing circuit of the second cluster circuit, the second computing circuit including one or more second processors each including a respective second instruction-executing computing core or a respective second configurable accelerator, together the one or more second processors including multiple second instruction-executing computing cores or at least one second configurable accelerator. - View Dependent Claims (110, 111, 112, 117, 121, 122, 125, 126, 127, 130, 133)
-
-
113-116. -116. (canceled)
-
118-120. -120. (canceled)
-
123-124. -124. (canceled)
-
128-129. -129. (canceled)
-
131-132. -132. (canceled)
-
134-135. -135. (canceled)
-
136. A non-transitory computer-readable medium storing configuration data that, when received by a field-programmable gate array, causes the field-programmable gate array:
-
to generate intermediate data with a first computing circuit of a first cluster circuit on an integrated circuit, the first computing circuit including one or more first processors each including a respective first instruction-executing computing core or a respective first configurable accelerator, together the one or more first processors including multiple first instruction-executing computing cores or at least one first configurable accelerator; to send the intermediate data from the first cluster circuit to a second cluster circuit on the integrated circuit via an interconnection network on the integrated circuit; and to generate, in response to the intermediate data, first output data with a second computing circuit of the second cluster circuit, the second computing circuit including one or more second processors each including a respective second instruction-executing computing core or a respective second configurable accelerator, together the one or more second processors including multiple second instruction-executing computing cores or at least one second configurable accelerator.
-
Specification