Method and system for converting a single-threaded software program into an application-specific supercomputer

US 10,146,516 B2
Filed: 09/06/2016
Issued: 12/04/2018
Est. Priority Date: 11/15/2011
Status: Active Grant

First Claim

Patent Images

1. A method, implemented by a compiler, to create an incomplete butterfly sub-network, where r >

=2 is a radix of the incomplete butterfly sub-network and is a power of two; and

where a number of input ports m >

=1 of the incomplete butterfly sub-network is not a power of r, and/or a number of output ports n >

=1 of the incomplete butterfly sub-network is not a power of r; and

where the incomplete butterfly sub-network is obtained from a corresponding complete butterfly sub-network consisting of multiplexers, buffers, and wires, where a number of input ports and a number of output ports of the corresponding complete butterfly sub-network are both equal to r^d, where d is a smallest integer that makes r^dgreater than or equal to a maximum of m and n, by;

retaining, in the incomplete butterfly sub-network, only multiplexers, buffers, and wires of the corresponding complete butterfly sub-network required for routing packets from a first m input ports of the corresponding complete butterfly sub-network to a first n output ports of the corresponding complete butterfly sub-network, anddeleting any remaining multiplexers, buffers, and wires of the corresponding complete butterfly sub-network; and

where the compiler automatically translates a single-threaded software program code fragment into a partitioned application-specific supercomputer functionally equivalent to the single-threaded software program code fragment, in part by creating one or more customized incomplete butterfly sub-networks for scalable message communication between hardware components of the partitioned application-specific supercomputer, where each customized incomplete butterfly sub-network among the one or more customized incomplete butterfly sub-networks has a minimum number of input ports, a minimum number of output ports, and a minimum number of payload bits per port for reducing area, power, and message communication latency.

View all claims

1 Assignment

Timeline View

Assignment View

0 Petitions

Accused Products

Abstract

The invention comprises (i) a compilation method for automatically converting a single-threaded software program into an application-specific supercomputer, and (ii) the supercomputer system structure generated as a result of applying this method. The compilation method comprises: (a) Converting an arbitrary code fragment from the application into customized hardware whose execution is functionally equivalent to the software execution of the code fragment; and (b) Generating interfaces on the hardware and software parts of the application, which (i) Perform a software-to-hardware program state transfer at the entries of the code fragment; (ii) Perform a hardware-to-software program state transfer at the exits of the code fragment; and (iii) Maintain memory coherence between the software and hardware memories. If the resulting hardware design is large, it is divided into partitions such that each partition can fit into a single chip. Then, a single union chip is created which can realize any of the partitions.

51 Citations

View as Search Results

6 Claims

1. A method, implemented by a compiler, to create an incomplete butterfly sub-network, where r >
- =2 is a radix of the incomplete butterfly sub-network and is a power of two; and
  
  where a number of input ports m >
  
  =1 of the incomplete butterfly sub-network is not a power of r, and/or a number of output ports n >
  
  =1 of the incomplete butterfly sub-network is not a power of r; and
  
  where the incomplete butterfly sub-network is obtained from a corresponding complete butterfly sub-network consisting of multiplexers, buffers, and wires, where a number of input ports and a number of output ports of the corresponding complete butterfly sub-network are both equal to r^d, where d is a smallest integer that makes r^dgreater than or equal to a maximum of m and n, by;
  
  retaining, in the incomplete butterfly sub-network, only multiplexers, buffers, and wires of the corresponding complete butterfly sub-network required for routing packets from a first m input ports of the corresponding complete butterfly sub-network to a first n output ports of the corresponding complete butterfly sub-network, anddeleting any remaining multiplexers, buffers, and wires of the corresponding complete butterfly sub-network; and
  
  where the compiler automatically translates a single-threaded software program code fragment into a partitioned application-specific supercomputer functionally equivalent to the single-threaded software program code fragment, in part by creating one or more customized incomplete butterfly sub-networks for scalable message communication between hardware components of the partitioned application-specific supercomputer, where each customized incomplete butterfly sub-network among the one or more customized incomplete butterfly sub-networks has a minimum number of input ports, a minimum number of output ports, and a minimum number of payload bits per port for reducing area, power, and message communication latency.
- View Dependent Claims (2)
- - 2. An incomplete butterfly sub-network created by the method of claim 1.

3. A method, implemented by a compiler, to create a task sub-network,where the task sub-network is scalable, is built entirely in hardware, does not include any processors executing software instructions, and serves to load-balance and distribute packets representing tasks to homogeneous hardware resources able to perform tasks;
- andwhere the task sub-network has one or more input ports and a plurality of output ports; and
  
  where a packet entering the task sub-network on an input port does not indicate any output port the packet should be routed to, but the packet is routed to any output port able to accept the packet; and
  
  where the compiler automatically translates a single-threaded software program code fragment into a partitioned application-specific supercomputer functionally equivalent to the single-threaded software program code fragment, in part by creating one or more customized task sub-networks for scalable message communication between hardware components of the partitioned application-specific supercomputer, where each customized task sub-network among the one or more customized task sub-networks has a minimum number of input ports, a minimum number of output ports, and a minimum number of payload bits per port for reducing area, power, and message communication latency.
- View Dependent Claims (4, 5, 6)
- - 4. The method of claim 3, further comprising creating the task sub-network structured as a torus of one or more dimensions, where a packet, after entering the task sub-network from an input port, travels through the torus until the packet encounters an output port which is able to accept the packet.
  - 5. The method of claim 4, further comprising creating the task sub-network, where each node of the torus is a task crossbar switch, where the task crossbar switch:
    - has one or more input ports and one or more output ports; and
      
      implements a packet routing algorithm based on attempting to match each input port of the task crossbar switch with a packet to an output port of the task crossbar switch able to accept the packet, as specified below;
      
      for each input port i of the task crossbar switch in increasing input port order,if there is a packet at the input port i of the task crossbar switch, then,if there is an output port j of the task crossbar switch, where (a) the output port j of the task crossbar switch is able to accept a packet, and (b) no packet has already been routed to the output port j of the task crossbar switch, and (c) j is a smallest output port number for which (a) and (b) are both true;
      
      then, the packet at the input port i of the task crossbar switch is routed to the output port j of the task crossbar switch;
      
      else, the packet at the input port i of the task crossbar switch is not routed and remains at the input port i of the task crossbar switch;
      
      end if;
      
      end if;
      
      end for;
      
      where routing of packets within the task crossbar switch is done with a parallel implementation of the packet routing algorithm; and
      
      where an input port of the task crossbar switch which is also an input port of the task sub-network numerically precedes any input port of the task crossbar switch which is not also an input port of the task sub-network; and
      
      where an output port of the task crossbar switch which is also an output port of the task sub-network numerically precedes any output port of the task crossbar switch which is not also an output port of the task sub-network.
  - 6. A task sub-network created by the method of claim 5.

Specification

Resources

Litigation Campaign Assessment

Current Assignee
Global Supercomputing Corporation
Original Assignee
Global Supercomputing Corporation
Inventors
Ebcioglu, Kemal, Kultursay, Emre
Primary Examiner(s)
Chen, Qing

Application Number

US15/257,319
Publication Number

US 20170017476A1
Time in Patent Office

819 Days
Field of Search

717136-161, 716105, 716116, 716117, 716124, 716125, 716128, 716131
US Class Current
CPC Class Codes

G06F 12/08   in hierarchically structure...

G06F 12/0862   with prefetch

G06F 12/0875   with dedicated cache, e.g. ...

G06F 12/0895   of parts of caches, e.g. di...

G06F 15/17381   Two dimensional, e.g. mesh,...

G06F 2115/10   Processors

G06F 2212/455   Image or video data

G06F 2212/6026   Prefetching based on access...

G06F 30/30   Circuit design

G06F 30/323   Translation or migration, e...

G06F 30/392   Floor-planning or layout, e...

G06F 8/40   Transformation of program code

G06F 8/4452   Software pipelining

G06F 8/452   Loops

G06F 9/52   Program synchronisation; Mu...

Y02D 10/00   Energy efficient computing,...

Method and system for converting a single-threaded software program into an application-specific supercomputer

First Claim

1 Assignment

0 Petitions

Accused Products

Abstract

51 Citations

6 Claims

Specification

Solutions

Use Cases

Quick Links

Method and system for converting a single-threaded software program into an application-specific supercomputer

First Claim

1 Assignment

Subscription Required

Subscription Required

0 Petitions

Subscription Required

Accused Products

Subscription Required

Abstract

51 Citations

6 Claims

Specification

Subscription Required

Solutions

Use Cases

Quick Links