Advanced parallel array processor(APAP)
First Claim
Patent Images
1. A computer system, comprising:
- a control unit, an interconnection system and a processing array for parallel processing having nodes which are interconnected with the distribution system to other processing nodes, wherein;
the control unit is programmable and has means for enabling the processing array having an array of processing elements to operate in coordination and which also enables a system control program to operate subsets of the parallel array with each subset dedicated to different applications or different phases of a single application program'"'"'s processing, whereinthe interconnection system provides physical connections between the control unit and the elements of the parallel array of processing elements enabling data and control transfers which are completely independent of the transfer of data between elements of the processing array,the interconnection system distributes functions associated with data transfer between elements of the processing array and distributed functions embedded in processing node software, andthe processing array provides non-shared memory and compute services and which are partitioned and the processing array is scalablewherein the control unit and interconnection system providemeans, including a broadcast bus path, for broadcasting data and instructions to the parallel array, for accumulating data and status information from the elements of the array, for generating and accepting status information which represents the union of the status derived from the elements of the array, and for controlling how elements of the parallel array interact with the broadcast bus path,means for continuously and unobtrusively accumulating status from individual processing elements to facilitate programmer testing and tuning,means for partitioning the elements of the array into subgroups that are controlled by interleaved commands and data transfers where;
subgroup size is specified by the application and/or system operating program and ranges from a single processing unit to the assembly of all processing units, subgroups may be assembled from any particular set of elements of the parallel array irregardless of the particular address information associated with the specific elements,means for generating program specified cross sections of the parallel array for assembly into partitions,means for associating with each command or data, tag information to control which partition should receive the data,means for writing to registers within the processor elements of the parallel array data specifying which partition code will be used to address the individual element,means for writing commands and data to the elements of the parallel array which are passed to the units irrespective of the current status of the partition data within any particular element of the parallel array,means for providing to elements of the parallel array a broadcast facility to all or subsets of the parallel array data specified by the application and/or system operating program, and such operations comprise sequential action of;
means for permitting all or a subset of the elements in the parallel array to signal the need for the broadcast,means for accumulating and prioritizing broadcast requests,means for causing the performance of a broadcast operation sequence,means for initializing parallel processor system operations and for providing additional program loads when directed by an application and/or system operating program.
1 Assignment
0 Petitions
Accused Products
Abstract
A computer system having a plurality of processors and memory including a plurality of scalable nodes having multiple like processor memory elements. Each of the processor memory elements has a plurality of communication paths for communication within a node to other like processor memory elements within the node. Each of the processor memory elements also has a communication path for communication external to the node to another like scalable node of the computer system.
290 Citations
2 Claims
-
1. A computer system, comprising:
- a control unit, an interconnection system and a processing array for parallel processing having nodes which are interconnected with the distribution system to other processing nodes, wherein;
the control unit is programmable and has means for enabling the processing array having an array of processing elements to operate in coordination and which also enables a system control program to operate subsets of the parallel array with each subset dedicated to different applications or different phases of a single application program'"'"'s processing, whereinthe interconnection system provides physical connections between the control unit and the elements of the parallel array of processing elements enabling data and control transfers which are completely independent of the transfer of data between elements of the processing array, the interconnection system distributes functions associated with data transfer between elements of the processing array and distributed functions embedded in processing node software, and the processing array provides non-shared memory and compute services and which are partitioned and the processing array is scalable wherein the control unit and interconnection system provide means, including a broadcast bus path, for broadcasting data and instructions to the parallel array, for accumulating data and status information from the elements of the array, for generating and accepting status information which represents the union of the status derived from the elements of the array, and for controlling how elements of the parallel array interact with the broadcast bus path, means for continuously and unobtrusively accumulating status from individual processing elements to facilitate programmer testing and tuning, means for partitioning the elements of the array into subgroups that are controlled by interleaved commands and data transfers where; subgroup size is specified by the application and/or system operating program and ranges from a single processing unit to the assembly of all processing units, subgroups may be assembled from any particular set of elements of the parallel array irregardless of the particular address information associated with the specific elements, means for generating program specified cross sections of the parallel array for assembly into partitions, means for associating with each command or data, tag information to control which partition should receive the data, means for writing to registers within the processor elements of the parallel array data specifying which partition code will be used to address the individual element, means for writing commands and data to the elements of the parallel array which are passed to the units irrespective of the current status of the partition data within any particular element of the parallel array, means for providing to elements of the parallel array a broadcast facility to all or subsets of the parallel array data specified by the application and/or system operating program, and such operations comprise sequential action of; means for permitting all or a subset of the elements in the parallel array to signal the need for the broadcast, means for accumulating and prioritizing broadcast requests, means for causing the performance of a broadcast operation sequence, means for initializing parallel processor system operations and for providing additional program loads when directed by an application and/or system operating program.
- a control unit, an interconnection system and a processing array for parallel processing having nodes which are interconnected with the distribution system to other processing nodes, wherein;
-
2. A computer system, comprising:
- a control unit, an interconnection system and a processing array for parallel processing having nodes which are interconnected with the distribution system to other processing nodes, wherein;
the control unit is programmable and has means for enabling the processing array having an array of processing elements to operate in coordination and which also enables a system control program to operate subsets of the parallel array with each subset dedicated to different applications or different phases of a single application program'"'"'s processing, whereinthe interconnection system provides physical connections between the control unit and the elements of the parallel array of processing elements enabling data and control transfers which are completely independent of the transfer of data between elements of the processing array, the interconnection system distributes functions associated with data transfer between elements of the processing array and distributed functions embedded in processing node software, and the processing array provides non-shared memory and compute services and which are partitioned and the processing array is scalable further comprising means for data transfer control including a set of two or more ports at each processing element for transfer of data from a processing element toward a destination, registers and counters in each processing element for each port of each processing element to manage the sending and receiving of a single block of data through the port where a block of data is variable length as determined by the application or system operating software, hardware paths within each port of each processing element that transition on the basis of transfer counts so as to permit software controls to setup in advance actions to take in the event of buffer full, end of message, and other events, means for set up of each processing element'"'"'s port count, address and control features to initiate transfers out, or prepare for a transfer in, control means within the processing element having software dedicated to end of message indication on a particular port to completely service routing and synchronization requirements resulting from the message, a combination hardware and software means for data transfer between processing elements to be managed as circuit switched traffic, store and forward packet switched traffic, or communication load driven combinations of the two approaches, thereby permitting applications which could generate possible deadlocks due to data transfer blocking to include protective features while applications without such risks will not incur a performance penalty, software means for operating at an end of message indication for an input end of the message to provide services the application requires, and further including; means for holding data and setting status suitable for application program polling, means for initiating application programs that have entered wait states, thereby providing I/O synchronization, initiate output message traffic to provide confirmation to sender, means for controlling inter-processing element data traffic software which permits applications to embed within I/O control software and further including; means for providing a combination of data transfer and data reduction operations that are required to perform operations across parallel processor, including operations where a Vector Inner Product requires a sum reduction involving tree like data transfers in combination with adds, such that an I/O program can perform the complete function, thereby providing the application with the appearance of dedicated hardware, means for performing data structure transformations that reshape complex entities, including data arrays when they are moved to the parallel processing array, means for imposing logical topologies over physical topologies, including in a machine wired like a hypercube, I/O routing programs can provide a logical NEWS topology for a particular application, and means for performing data access conversions including those required for FFT butterfly processing, bit reversed data addressing, and perfect shuffle data transformations.
- a control unit, an interconnection system and a processing array for parallel processing having nodes which are interconnected with the distribution system to other processing nodes, wherein;
Specification