Scalable processor to processor and processor-to-I/O interconnection network and method for parallel processing arrays

US 5,280,474 A
Filed: 01/05/1990
Issued: 01/18/1994
Est. Priority Date: 01/05/1990
Status: Expired due to Fees

First Claim

Patent Images

1. A multi-stage interconnect network (MIN) for a parallel processor array comprising:

first, second and third switching stages for forming routing paths between processor elements (PEs) of the parallel processor array, each stage resolving one or more bits of a data routing header; and

address bit duplicating means for duplicating bits resolved in a first stage such that the same bits are again resolved in a later stage to balance data routing loading;

wherein;

each PE is identified as belonging to a cluster of a plurality of PEs;

each cluster is identified as belonging to one of a plurality of PE circuit boards; and

said multi-stage interconnect network is divided into first, second, third and fourth resolving stages for resolving a plurality of route-requesting bits identifying each target PE, the second resolving stage being implemented in said second switching stage for revolving route requests according to the PE board on which the target PE resides, the fourth resolving stage being implemented in said each cluster of PEs for resolving the bits of a route requesting signal according to the location of the target PE within a specified PE cluster, and the first and third resolving stages being implemented in said first and third switching stages respectively for resolving the cluster number of the target PE.

View all claims

4 Assignments

Timeline View

Assignment View

0 Petitions

Accused Products

Abstract

A massively parallel computer system is disclosed having a global router network in which pipeline registers are spatially distributed to increase the messaging speed of the global router network. The global router network includes an expansion tap for processor to I/O messaging so that I/O messaging bandwidth matches interprocessor messaging bandwidth. A route-opening message packet includes protocol bits which are treated homogeneously with steering bits. The route-opening packet further includes redundant address bits for imparting a multiple-crossbars personality to router chips within the global router network. A structure and method for spatially supporting the processors of the massively parallel system and the global router network are also disclosed.

Citations

11 Claims

1. A multi-stage interconnect network (MIN) for a parallel processor array comprising:
- first, second and third switching stages for forming routing paths between processor elements (PEs) of the parallel processor array, each stage resolving one or more bits of a data routing header; and
  
  address bit duplicating means for duplicating bits resolved in a first stage such that the same bits are again resolved in a later stage to balance data routing loading;
  
  wherein;
  
  each PE is identified as belonging to a cluster of a plurality of PEs;
  
  each cluster is identified as belonging to one of a plurality of PE circuit boards; and
  
  said multi-stage interconnect network is divided into first, second, third and fourth resolving stages for resolving a plurality of route-requesting bits identifying each target PE, the second resolving stage being implemented in said second switching stage for revolving route requests according to the PE board on which the target PE resides, the fourth resolving stage being implemented in said each cluster of PEs for resolving the bits of a route requesting signal according to the location of the target PE within a specified PE cluster, and the first and third resolving stages being implemented in said first and third switching stages respectively for resolving the cluster number of the target PE.
- View Dependent Claims (2)
- - 2. The network of claim 1 wherein resolution of bits of the data routing header involves a delay in said first and second switching stages, and said route-requesting signal includes first and second groups of rest bits respectively interposed after the stage-1 resolving bits and after the stage-2 resolving bits for allowing the network to stabilize from effects of the delay in resolving stage-1 and the stage-2 bits in said first and second switching stages.

3. A global router network for a massively parallel array of processing elements, the routing network comprising a plurality of data-routing stages, wherein each of said data-routing stages comprises:
- a route requesting input wire (RRW-x) for receiving a route-requesting header signal;
  
  a pipeline latch (612) having a data input terminal (D) and a data output terminal (Q);
  
  a first tristate buffer (611) for selectively coupling the route request input wire (RRW-x) to the data input terminal of the pipeline latch (612);
  
  a switching matrix (615) having a router header-in line (621x), horizontal data input lines (650x), vertical output lines (654Y) and switching cells (620) for selectively coupling any one of said horizontal input lines (650x) to one of the vertical output lines (654Y);
  
  a second tristate buffer (652) for selectively coupling the output terminal (Q) of the pipeline latch (612) to said horizontal input line (650x) of the switching matrix (615) during a forward messagingmode;
  
  a third tristate buffer (657) for selectively coupling said horizontal data line (650x) to the data input terminal (D) of the pipeline latch (612) during a reverse messaging mode; and
  
  a fourth tristate buffer (658) for selectively coupling the output terminal (Q) of the pipeline latch (612) to the route requesting wire (RRW-x) during the reverse messaging mode.
- View Dependent Claims (4)
- - 4. The interconnect network of claim 3 wherein each switching cell (620) comprises:
    - a route selecting switch (623) for selectively connecting its router header input line (621) either through a single inverter (624) or through a noninverting circuit (622,
      
      624) to a horizontal output wire (625);
      
      a wire-group request latch (630) having a request input terminal (D) and a request output terminal (Q);
      
      gating means (629) for coupling a route requesting bit on the horizontal output line (625) to the request input terminal of the wire-group request latch (630 if the wire-group request latch has not been activated by a previous request bit, and for preventing further bits from entering the wire-group request latch if it has already been activated; and
      
      request granting means (631,
      
      632) for receiving a request input signal (631a) from the wire-group request latch and granting said request by connecting (631c) a horizontal data wire (650) to a corresponding vertical data wire (641) if a vertical messaging wire (641) has not already been granted to another route requesting signal.

5. A method for routing data in a global router system between any one processor element (PE) of an array of processor elements (PEs) and any other PE of the array, comprising the steps of:
- providing an interconnection network for establishing data routing paths between a set of source PEs and a set of target PEs;
  
  furnishing said PEs with respectively parity identities having precomputed values based on the array addresses of the respective PEs;
  
  generating route requesting signals to be propagated at least in part through said interconnection network from the set of source PEs to the set of target Pes for establishing data carrying routes through said interconnection network in accordance with address information in said route requesting signals, each of said route requesting signals including a protocol bit for indicating to said interconnection network the presence of a route requesting signal;
  
  `generating parity bits respectively associated with said route requesting signals for propagating through said interconnection network to indicate respectively an odd or even parity of the addresses in said route requesting signals; and
  
  comparing in each PE of the set of target PEs receiving a parity bit the parity identity thereof with the received parity bit to indicate an error condition in the event the parity identity of said each PE and the parity bit received by said each PE are unequal.
- View Dependent Claims (6, 7, 8, 9, 10)
- - 6. A method as in claim 5, further comprising returning signals over said established data carrying routes rom the set of target PEs to the set of source PEs to indicate whether a correct set of routes is established in said interconnection network.
  - 7. A method as in claim 6 wherein each of the returning signals indicating a correct route is included in a reverse acknowledge signal in accordance with a route close protocol.
  - 8. A method as in claim 6 wherein each of the returning signals indicating a correct route is included in a reverse message body signal in accordance with a route reverse protocol.
  - 9. A method as in claim 5, further comprising the steps of:
    - generating toggle bits respectively associated with said route requesting signals for propagating through said interconnection network, said toggle bits having a particular value; and
      
      detecting the values of said toggle bits after propagation through said interconnection network, an error condition being indicated in the event that one or more of said toggle bits is not equal to said particular value.
  - 10. A method as in claim 5, wherein said route requesting signals include respective PE numbers, further comprising, for each of said PEs receiving a PE number through a route requesting signal, in parallel, the steps of:
    - comparing said received PE number with a PE identification number preassigned to said each PE to obtain a match signal indicative of whether said received PE number and said preassigned PE identification number match; and
      
      performing an AND operation with said match signal and the protocol bit received by said each PE to determine whether a valid data routing path is established.

11. In a parallel processor having an array of processor elements, an interconnection network for indirectly routing data from one set of the processor elements to another set of the processor elements comprising:
- a first bidirectional latch having a set of first ports and a set of second ports;
  
  first bidirectional routing path segments respectively connected to the first ports of the first latch, the first routing path segments including a first switch stage responsive to header data from the processor elements for configuring the first routing path segments;
  
  second bidirectional routing path segments respectively connected to the second ports of the first latch, the second routing path segments including a second switch stage responsive to header data from the first switch stage for configuring the second routing path segments;
  
  a second bidirectional latch having a set of first ports and a set of second ports, the first ports thereof being connected to the processor elements and the second ports thereof being respectively connected to the first routing path segments;
  
  a third bidirectional latch having a set of first ports and a set of second ports, the first ports thereof being respectively connected to the second routing path segments, and the second ports thereof being connected to the processor elements;
  
  third and fourth bidirectional routing path segments, wherein;
  
  the first ports of the second bidirectional latch are respectively connected to the processor elements by the third bidirectional routing path segments; and
  
  the second ports of the third bidirectional latch are respectively connected to the processor elements by the fourth bidirectional routing path segments; and
  
  means for operating the first, second and third latches and the processor elements to transfer data between one set of the processor elements and another set of the processor elements in either direction along routing paths comprising the configured first, second, third and fourth routing path segments.

Specification

Resources

Litigation Campaign Assessment

Current Assignee
Kleiner Perkins Caufield-Byers Iv
Original Assignee
MasPar Computer Corporation
Inventors
Kalb, Jeffery C., Nickolls, John R., Kim, Won S., Wegbreit, Eliot, Zapisek, John, Blank, W. Thomas, Van Horn, Kevin
Primary Examiner(s)
Olms, Douglas W.
Assistant Examiner(s)
Ton, Dang

Application Number

US07/461,492
Time in Patent Office

1,474 Days
Field of Search

370/60, 370/85.9, 370/85.11, 370/85.12, 370/85.13, 370/85.14, 370/94.1, 370/94.3, 364/133, 364/200, 364/229.2, 371/11.2, 371/38.1, 371/48, 371/49, 371/149.3, 340/825.02, 340/825.79, 340/825.8, 340/85.85, 395/800, 395/325, 395/425, 395/375
US Class Current

370/389
CPC Class Codes

G06F 15/17393 having multistage networks,...

Scalable processor to processor and processor-to-I/O interconnection network and method for parallel processing arrays

First Claim

4 Assignments

0 Petitions

Accused Products

Abstract

Citations

11 Claims

Specification

Solutions

Use Cases

Quick Links

Scalable processor to processor and processor-to-I/O interconnection network and method for parallel processing arrays

First Claim

4 Assignments

Subscription Required

Subscription Required

0 Petitions

Subscription Required

Accused Products

Subscription Required

Abstract

Citations

11 Claims

Specification

Subscription Required

Solutions

Use Cases

Quick Links