High performance computer system
First Claim
1. A parallel processor comprising in combination:
- a plurality of first processing nodes;
a single oscillator clock common to all of said first processing nodes;
each of said first processing nodes including a processor and a memory, said memory having data and instructions stored therein, said processor including(1) executing means for executing said instructions,(2) fetching means connected to said execution means and to said memory for fetching said instructions from said memory, and,(3) internode communication means connected to said execution means and to said memory;
said internode communication means comprising an asynchronous I/O channel for fetching data from said memory at an address supplied by said I/O channel and for sending said data to another one of said plurality of first processing nodes, said asynchronous I/O channel being connected to and driven by said single oscillator clock; and
,first means, connected to each of said internode communication means of said first nodes, for interconnecting said first nodes in the structure of a first array of processing nodes, said first array having a hypercube topology.
10 Assignments
0 Petitions
Accused Products
Abstract
A parallel processor comprised of a plurality of processing nodes (10), each node including a processor (100-114) and a memory (116). Each processor includes means (100, 102) for executing instructions, logic means (114) connected to the memory for interfacing the processor with the memory and means (112) for internode communication. The internode communication means (112) connect the nodes to form a first array (8) of order n having a hypercube topology. A second array (21) of order n having nodes (22) connected together in a hypercube topology is interconnected with the first array to form an order n+l array. The order n+l array is made up of the first and second arrays of order n, such that a parallel processor system may be structured with any number of processors that is a power of two. A set of I/O processors (24) are connected to the nodes of the arrays (8, 21) by means of I/O channels (106). The means for internode communication (112) comprises a serial data channel driven by a clock that is common to all of the nodes.
215 Citations
12 Claims
-
1. A parallel processor comprising in combination:
-
a plurality of first processing nodes; a single oscillator clock common to all of said first processing nodes;
each of said first processing nodes including a processor and a memory, said memory having data and instructions stored therein, said processor including(1) executing means for executing said instructions, (2) fetching means connected to said execution means and to said memory for fetching said instructions from said memory, and, (3) internode communication means connected to said execution means and to said memory; said internode communication means comprising an asynchronous I/O channel for fetching data from said memory at an address supplied by said I/O channel and for sending said data to another one of said plurality of first processing nodes, said asynchronous I/O channel being connected to and driven by said single oscillator clock; and
,first means, connected to each of said internode communication means of said first nodes, for interconnecting said first nodes in the structure of a first array of processing nodes, said first array having a hypercube topology. - View Dependent Claims (2, 3)
-
-
4. A parallel processor array comprising:
-
a plurality of array boards (1 to k);
a first one of said array boards being comprised of m processing nodes,each one of said m processing nodes including a memory for storing data and instructions, means for fetching and executing said instructions, and p I/O channels, there being m such nodes on said first one of said array boards; each of said p I/O channels at each one of said m processing nodes comprising an asynchronous I/O channel for fetching data from said memory to an address supplied by said I/O channel and for sending said data to another one of said m processing nodes; and
,means for interconnecting said m nodes on said first board in an order n hypercube comprised of 2n =n processing nodes; said interconnecting means utilizing n of the p I/O channels to effectuate the interconnections among said nodes; and
,a backplane; said backplane including first means for receiving said processor boards; said backplane including second means for interconnecting said K processors boards in an order P hypercube, where K=2j,m=2n, and P=j+n. - View Dependent Claims (5)
-
-
6. A parallel processor array comprising:
-
a plurality of array boards (1 to k); a first one of said array boards being comprised of m processing nodes, each one of said m processing nodes including a local memory for storing data and instructions, means for fetching and executing said instructions, and p I/O channels, there being m such nodes on said first one of said array boards; means for interconnecting said m nodes on said first board in an order n hypercube comprised of 2n =m processing nodes; said interconnecting means utilizing n of the p I/O channels to effectuate the interconnections among said nodes; a backplane; said backplane including first means for receiving said processor boards; said backplane including second means for interconnecting said K processors boards in an order P hypercube, where K=2j, m=2n, and P=J+n; a plurality of system control boards (1to x); each one of said system control boards being comprised of r dual-ported processing nodes, each one of said r dual-ported processing nodes including a processor, a local dual-ported memory, a plurality of system host channels (1 to v), and a plurality of I/O channels (1 to s); and
,first means for interconnecting said r dual-reported processing nodes on said system control board in an order t hypercube comprised of 2t =r dual-ported processing nodes on each system control board; said interconnecting means utilizing t of the s I/O channels to effectuate the interconnections among said nodes; said backplane including third means for receiving said system control boards; said backplane including fourth means for interconnecting said x system control boards in an order s hypercube of dual-ported processing anodes, where x=2u, r=2t, and s=t+u; said v system host channels being made available at said backplane for use in communication with said processing nodes on said array boards.
-
-
7. A parallel processor comprising in combination:
-
a plurality of first process nodes; a plurality of second processing nodes; a clock common to all of said first and second processing nodes; each of said first and second nodes including a processor and a memory, each of said processors including (1) execution means for executing said instructions, (2) internode communication means connected to said execution means and to said memory; said internode communication means comprising a data channel connected to and driven by said clock; first means, connected to each of said internode communication means of said first nodes, for interconnecting said first nodes in the structure of a first array of processing nodes, said first array having a hypercube topology; second means, connected to each of said internode communication means of said second nodes, for interconnecting said second nodes in the structure of a second array of processing nodes, said second array having a hypercube topology; said first and second arrays each being of order n; and
,third means, connected to each of said first and second nodes, for interconnecting said first array and said second array together to form an order n+1 array of which said first and second arrays are a subset, and wherein said order n+1 array is made up of said first and second arrays of order n, such that a parallel processor system is structured with a number of processors that is a power of two; a first number of unidirectional direct memory access (DMA) output channels connected to said execution means on each of said processors; a second number of unidirectional direct memory access (DMA) input channels connected to each of said execution means on each of said processor; each of said DMA channels including two multibit registers, an address pointer register for a message buffer location in memory, and a byte count register indicating the number of bytes left to send or receive; a first subset of said I/O channels being used for communicating with a host, a second subset of said I/O channels being used for communicating within said order n+1 array; each of said I/O channels having an address pointer register, a byte count register, and a "ready" flag; means for transmitting a messages having a start bit, a message unit, and a parity bit, said transmitting means including means in said execution means for executing a LPTR (Load Pointer) instruction having a first operand and a second operand, said LPTR instruction executing means further including means for setting said address pointer register to point to the low byte of the first message unit in said message buffer in said memory, said first operand of said LPTR instruction being the address of said message buffer and the second operand of said LPTR instruction being an integer whose value determines which of said address registers is to be loaded; means in said execution means for executing a LCNT (Load Count) instruction having a first operand and a second operand, said first operand of said LCNT instruction being an integer (the count value) equal to the number of bytes in said message and said second operand being a value that indicates which of said byte count registers is to be loaded; means operative as each message is sent for incrementing said address register and decrementing said count; and
,means operative upon the condition that said byte count is zero for stopping message transmission, and for setting said ready flag. - View Dependent Claims (8, 9)
-
-
10. For use in a parallel processor array comprising a plurality of processor array boards (1 to k), and a clock board having a single oscillator thereon for providing clock lines, said clock lines being driven by said single oscillator,
said processor array boards being comprised of m processing nodes, each one of said m processing nodes including a local memory for storing data and instructions, means for fetching and executing said instructions, and p I/O channels, there being m such nodes on said processor array boards; -
each of said p I/O channels at each one of said m processing nodes comprising an asynchronous I/O channel for fetching data from said memory at an address supplied by said I/O channel and for sending said data to another one of said m processing nodes; and
, p1 means for interconnecting said m nodes on said processor array board in an order n hypercube comprised of 2n =m processing nodes;said interconnecting means utilizing n of the p channels to effectuate the interconnections among said nodes, a backplane comprising; first means for receiving said K processor array boards; second means for interconnecting said K processor array boards in an order P hypercube, where n is the order of the hypercube on each of said array boards and where K=2j and P=n+j; third means for receiving said clock board; and fourth means for connecting said clock lines to said array boards.
-
-
11. For use in a parallel processor array comprising a plurality of processor array boards (1 to k), and a clock board for providing clock lines,
said processor array boards being comprised of m processing nodes, each one of said m processing nodes including a local memory for storing data and instructions, means for fetching and executing said instructions, and p I/O channels, there being m such nodes on said array boards; - and,
means for interconnecting said m nodes on said processor array board in an order n hypercube comprised of 2n =m processing nodes; said interconnecting means utilizing n of the p channels to effectuate the interconnections among said nodes, a backplane comprising; first means for receiving said K processor array boards; second means for interconnecting said K processor array boards in an order P hypercube, where n is the order of the hypercube on each of said array boards and where K=2j and P=n+j; third means for receiving said clock board; fourth means for connecting said clock lines to said array boards; said parallel processor array further including a plurality of system control boards, (1to x) fifth means for receiving said x system control boards; and
,sixth means for interconnecting said x system control boards into an order s hypercube, where t is the order of the hypercube on each of said system control boards and where x=28 and s=t+u.
- and,
-
12. The backplane as set forth in accordance with claim 13 wherein said processing nodes on said processor array boards each include a system host channel, and wherein said system control boards are comprised of r dual-ported processing nodes, each one of said r dual-ported processing nodes on said system control boards including v system host channels, said backplane further comprising:
seventh means for interconnecting said system host channels on said k array boards to said system host channels on said x system control boards.
Specification