Modular array processor architecture having a plurality of interconnected load-balanced parallel processing nodes
First Claim
1. An expandable modular array processor architecture comprising:
- a plurality of processing nodes, wherein at least one of said plurality of processing nodes is operable to perform system startup and, wherein each processing node comprises;
an arithmetic processor having an input/output port for high speed receiving of data or transmitting of data from an external source that is to be processed, and dedicated local memories, said arithmetic processor further operable to execute signal processing primitive functions;
a control processor for controlling processing activity for all processors contained in the plurality of processing nodes and reallocate tasks assigned for processing in its node to available processors in other nodes based on a predetermined set of rules that are implemented by means of a heuristic task scheduling program, said control processors operable upon system startup to perform self tests and then tests of said processing nodes;
a large capacity node memory that also comprises a portion of a distributed global memory, said large capacity node memory operable to store intermediate data and results; and
a network interface coupled between the control processor, the arithmetic processor and the node memory;
a data bus coupled between respective arithmetic processors and network interfaces of each of the plurality of processing nodes; and
a control bus coupled between the respective arithmetic processors and network interfaces of each of the plurality of processing nodes;
wherein respective network interfaces link the respective arithmetic processors, node memories and control processors together to provide for communication therebetween and permit each node to communicate with respective node memories of all other processing nodes to provide for load balancing therebetween, and to buffer data transferred over the data and control buses to a respective node, and to operate as high-speed DMA controllers to transfer data between the arithmetic processor and node memory of a processing node independent of the control processor in that node.
0 Assignments
0 Petitions
Accused Products
Abstract
A modular array processor architecture (10) comprising a plurality of interconnected parallel processing node (11)s that each comprise a control processor (12), an arithmetic processor (13) having an input port (22) for receiving data from an external source that is to be processed, a node memory (14) that also comprises a portion of a distributed global memory, and a network interface (15) coupled between the control processor (12), the arithmetic processor (13), and the node memory (14). Data and control buses (17, 18) are coupled between the arithmetic processors (13) and network interfaces (14) of each of the processing nodes (11). Respective network interfaces (15) link each of the arithmetic processors (13), node memories (14) and control processors (12) together to provide for communication throughout the architecture (10) and permit each node to communicate with the node memories (14) of all other processing nodes (11). This linking, along with the use of a heuristic scheduling algorithm, provides for load balancing between the processing nodes (11). Data queues are segmented and distributed across the architecture (10) in a way that the source and destination nodes (11) process data locally in the memory (14), while overflow is kept in distributed bulk memories (14). The network interfaces (15) buffer data transferred over the data and control buses (17, 18) to a respective node (11). Also, the network interfaces (15) operate as high-speed DMA controllers to transfer data between the arithmetic processor (13) and node memory (14) of a processing node (11) independent of the operation of the control processor (12) in that node (11). The control bus (17) is used to keep track of available resources throughout the architecture (10) under control of a heuristic scheduling algorithm that reallocates tasks to available arithmetic processors (13) based on a set of heuristic rules to achieve the load balancing. The data bus (18) is used to transfer data between the node memories (14) so that reallocated tasks are performed by selected arithmetic and control processors (13, 12) using data that is stored locally.
180 Citations
7 Claims
-
1. An expandable modular array processor architecture comprising:
-
a plurality of processing nodes, wherein at least one of said plurality of processing nodes is operable to perform system startup and, wherein each processing node comprises; an arithmetic processor having an input/output port for high speed receiving of data or transmitting of data from an external source that is to be processed, and dedicated local memories, said arithmetic processor further operable to execute signal processing primitive functions; a control processor for controlling processing activity for all processors contained in the plurality of processing nodes and reallocate tasks assigned for processing in its node to available processors in other nodes based on a predetermined set of rules that are implemented by means of a heuristic task scheduling program, said control processors operable upon system startup to perform self tests and then tests of said processing nodes; a large capacity node memory that also comprises a portion of a distributed global memory, said large capacity node memory operable to store intermediate data and results; and a network interface coupled between the control processor, the arithmetic processor and the node memory; a data bus coupled between respective arithmetic processors and network interfaces of each of the plurality of processing nodes; and a control bus coupled between the respective arithmetic processors and network interfaces of each of the plurality of processing nodes; wherein respective network interfaces link the respective arithmetic processors, node memories and control processors together to provide for communication therebetween and permit each node to communicate with respective node memories of all other processing nodes to provide for load balancing therebetween, and to buffer data transferred over the data and control buses to a respective node, and to operate as high-speed DMA controllers to transfer data between the arithmetic processor and node memory of a processing node independent of the control processor in that node. - View Dependent Claims (2, 3, 4, 5, 6, 7)
-
Specification