Method and system for converting a single-threaded software program into an application-specific supercomputer
First Claim
1. A general-purpose supercomputer for performing parallel execution of parallel software compiled from a code fragment within a single-threaded software application, where the general-purpose supercomputer comprises:
- a. a plurality of general-purpose processors;
b. one or more task networks connected to the plurality of general-purpose processors, where each task network among the one or more task networks;
allows a first general-purpose processor on the task network to send a task invocation request to a second general-purpose processor on the task network, andallows the first general-purpose processor on the task network to receive back either a task result message or a task completion acknowledgement from the second general-purpose processor on the task network;
c. at least one hardware synchronization unit to ensure that if a memory instruction instance I2 is dependent on a memory instruction instance I1 in sequential execution of the code fragment within the single-threaded software application, the memory instruction instance I2 is executed after the memory instruction instance I1 in the parallel execution of the parallel software performed by the general-purpose supercomputer; and
d. at least one coherent memory hierarchy, which;
(i) supports a plurality of load/store ports that are accessed by the plurality of general-purpose processors in parallel; and
(ii) signals a completion of each memory instruction issued from each load/store port of the plurality of load/store ports, for supporting synchronization units;
where the parallel execution of the parallel software by the general-purpose supercomputer is functionally equivalent to the sequential execution of the code fragment within the single-threaded software application; and
where the general-purpose supercomputer is implemented as a plurality of copies of a union module implemented in ASIC technology, with scalable network connections, and where the union module implemented in ASIC technology is able to perform function of any of a plurality of modules resulting from partitioning a hardware design of the general-purpose supercomputer.
1 Assignment
0 Petitions
Accused Products
Abstract
The invention comprises (i) a compilation method for automatically converting a single-threaded software program into an application-specific supercomputer, and (ii) the supercomputer system structure generated as a result of applying this method. The compilation method comprises: (a) Converting an arbitrary code fragment from the application into customized hardware whose execution is functionally equivalent to the software execution of the code fragment; and (b) Generating interfaces on the hardware and software parts of the application, which (i) Perform a software-to-hardware program state transfer at the entries of the code fragment; (ii) Perform a hardware-to-software program state transfer at the exits of the code fragment; and (iii) Maintain memory coherence between the software and hardware memories. If the resulting hardware design is large, it is divided into partitions such that each partition can fit into a single chip. Then, a single union chip is created which can realize any of the partitions.
57 Citations
4 Claims
-
1. A general-purpose supercomputer for performing parallel execution of parallel software compiled from a code fragment within a single-threaded software application, where the general-purpose supercomputer comprises:
-
a. a plurality of general-purpose processors; b. one or more task networks connected to the plurality of general-purpose processors, where each task network among the one or more task networks; allows a first general-purpose processor on the task network to send a task invocation request to a second general-purpose processor on the task network, and allows the first general-purpose processor on the task network to receive back either a task result message or a task completion acknowledgement from the second general-purpose processor on the task network; c. at least one hardware synchronization unit to ensure that if a memory instruction instance I2 is dependent on a memory instruction instance I1 in sequential execution of the code fragment within the single-threaded software application, the memory instruction instance I2 is executed after the memory instruction instance I1 in the parallel execution of the parallel software performed by the general-purpose supercomputer; and d. at least one coherent memory hierarchy, which; (i) supports a plurality of load/store ports that are accessed by the plurality of general-purpose processors in parallel; and (ii) signals a completion of each memory instruction issued from each load/store port of the plurality of load/store ports, for supporting synchronization units; where the parallel execution of the parallel software by the general-purpose supercomputer is functionally equivalent to the sequential execution of the code fragment within the single-threaded software application; and where the general-purpose supercomputer is implemented as a plurality of copies of a union module implemented in ASIC technology, with scalable network connections, and where the union module implemented in ASIC technology is able to perform function of any of a plurality of modules resulting from partitioning a hardware design of the general-purpose supercomputer. - View Dependent Claims (2, 3, 4)
-
Specification