Two dimensional crossbar mesh for multi-processor interconnect
First Claim
1. A parallel processor array, comprising a first plurality of processor elements configured as a second plurality of clusters of processing elements, and an interconnection network for interconnecting the processor clusters, the network including a two-dimensional mesh of multi-port crossbar switch elements arranged in rows and columns in a crossbar mesh network, each crossbar switch element including a third plurality of ports and controllable switching for operatively connecting one part of the crossbar switch element to another port of the crossbar switch element, and wherein each processor cluster is connected to a port of a row crossbar switch element and to a port of a column crossbar switch element, and wherein an input data set to be processed is supplied directly into the network via crossbar switch element input ports for initial partitioning of the data set among the processing elements, said input data set being characterized as a three dimensional data cube, said three dimensional data cube being characterized as sensor data, a first data dimension represents a sensor channel dimension, a second data dimension represents a Doppler dimension, and a third data dimension represents a Range cell dimension, and wherein the interconnection network is configurable in an initial state such that the data set is initially distributed among the processing elements for processing in a first data dimension during a first processing function, and subsequently is configurable to perform a data dimension transposition of the data set for processing in a second data dimension by the processing elements during a second processing function.
1 Assignment
0 Petitions
Accused Products
Abstract
A parallel processor array with a two-dimensional crossbar switch architecture. Individual processing elements are configured as clusters of processors, wherein the individual processing elements within each cluster are interconnected by a two dimensional cluster network of crossbar switch elements. The clusters are interconnected via a two dimensional array network of crossbar switch elements, supporting high-bandwidth inter-processor data shuffles that characterize parallel implementations of sensor processing problems. Input data is supplied directly into the array network of crossbar switch elements, which allows an optimal initial partitioning of the data set among the processing elements. The array architecture supports a virtual array sizing, where the processor array can be treated as a variable sized array with dimensions that are software controllable, selectable to match system characteristics.
-
Citations
16 Claims
- 1. A parallel processor array, comprising a first plurality of processor elements configured as a second plurality of clusters of processing elements, and an interconnection network for interconnecting the processor clusters, the network including a two-dimensional mesh of multi-port crossbar switch elements arranged in rows and columns in a crossbar mesh network, each crossbar switch element including a third plurality of ports and controllable switching for operatively connecting one part of the crossbar switch element to another port of the crossbar switch element, and wherein each processor cluster is connected to a port of a row crossbar switch element and to a port of a column crossbar switch element, and wherein an input data set to be processed is supplied directly into the network via crossbar switch element input ports for initial partitioning of the data set among the processing elements, said input data set being characterized as a three dimensional data cube, said three dimensional data cube being characterized as sensor data, a first data dimension represents a sensor channel dimension, a second data dimension represents a Doppler dimension, and a third data dimension represents a Range cell dimension, and wherein the interconnection network is configurable in an initial state such that the data set is initially distributed among the processing elements for processing in a first data dimension during a first processing function, and subsequently is configurable to perform a data dimension transposition of the data set for processing in a second data dimension by the processing elements during a second processing function.
-
6. A parallel processor array, comprising:
-
a first plurality of processor elements configured as a second plurality of programming cluster of processing elements; for each programming cluster, an intra-cluster interconnection network of multi-port crossbar switch elements for interconnecting the processor elements comprising the processing cluster, the intra-cluster interconnection network adapted to provide each processing element of the cluster equal access to said port of said column crossbar switch element and said port of said row crossbar switch element to which said cluster is connected, and wherein each processing cluster is organizable as a variable sized sub-array of the processing elements comprising the processing cluster; an inter-cluster interconnection network for interconnecting the processor clusters with programmable high bandwidth data transmission links, the network including a two-dimensional mesh of multi-port crossbar switch elements arranged in rows and columns in a crossbar mesh network, each crossbar switch element including a third plurality of ports and controllable switching for operatively connecting one port of the crossbar switch element to another port of the crossbar switch element, wherein each processor cluster is connected to a port of a row crossbar switch element and to a port of a column crossbar switch element, and wherein an input data set to be processed is supplied directly into the network via crossbar switch element input ports for initial partitioning of the data set among the processing elements, wherein the input data set is characterized as a three dimensional data cube, wherein the data cube is characterized as sensor data, a first data dimension represents a sensor channel dimension, a second data dimension represents a Doppler dimension, and a third data dimension represents a Range cell dimension, and configurable in an initial state such that the data set is initially distributed among the processing elements for processing in a first data dimension during a first processing function, and subsequently is configurable to perform a data dimension transposition of the data set for processing in a second data dimension by the processing elements during a second processing function. - View Dependent Claims (7, 8, 9)
-
- 10. A parallel processor array, comprising a first plurality of processor elements configured as a second plurality of clusters of processing elements, and an interconnection network for interconnecting the processor clusters, the network including a two-dimensional mesh of multi-port crossbar switch elements arranged in rows and columns in a crossbar mesh network, each crossbar switch element including a third plurality of ports and controllable switching for operatively connecting one port of the crossbar switch element to another port of the crossbar switch element, wherein the row crossbar switch elements each include an input port for receiving input data of the data set, and an output port for transferring data out of the row switch element, and wherein each processor cluster is connected to a port of a row crossbar switch element and to a port of a column crossbar switch element, and wherein an input data set to be processed is supplied directly into the network via crossbar switch element input ports for initial partitioning of the data set among the processing elements, and a third plurality of processing modules to perform subsequent processing of data by said array processing elements, and a fourth plurality of multi-port output crossbar switch elements connected between the row output ports and the processing modules to allow each processing module to communicate with multiple row switch elements.
-
14. A parallel processor array, comprising:
-
a first plurality if processor elements configured as a second plurality of programming clusters of processing elements; for each programming cluster, an intra-cluster interconnection network of multi-port crossbar switch elements for interconnecting the processor elements comprising the processing cluster, the intra-cluster interconnection network adapted to provide each processing element of the cluster equal access to said port of said column crossbar switch element and said port of said row crossbar switch element to which said cluster is connected, and wherein each processing cluster is organizable as a variable sized sub-array of the processing elements comprising the processing cluster, wherein the row crossbar switch elements each include an input port for receiving input data of the data set, and an output port for transferring data out of the row switch element; an inter-cluster interconnection network for interconnecting the processor clusters with programmable high bandwidth data transmission links, the network including a two-dimensional mesh of multi-port crossbar switch elements arranged in rows and columns in a crossbar mesh network, each crossbar switch element including a third plurality of ports and controllable switching for operatively connecting one port of the crossbar switch element to another port of the crossbar switch element, wherein each processor cluster is connected to a port of a row crossbar switch element and to a port of a column crossbar switch element, and wherein an input data set to be processed is supplied directly into the network via crossbar switch element input ports for initial partitioning of the data set among the processing elements; and a third plurality of processing modules to perform subsequent processing of data by said array processing elements, and a fourth plurality of multi-port output crossbar switch elements connected between the row output ports and the processing modules to allow each processing module to communicate with multiple row switch elements. - View Dependent Claims (15, 16)
-
Specification