Method and system for converting a single-threaded software program into an application-specific supercomputer
First Claim
1. A directory-mapped coherent cache memory hierarchy with a write-update protocol of a partitioned application-specific supercomputer comprising a memory including a plurality of ports and a cache, where:
- a. whenever a store instruction is executed to a shared line in the cache, all copies of the shared line in other caches are automatically updated with the same store instruction;
b. the automatic update is accomplished with message communication, over a plurality of scalable networks, only among caches having the shared line, and a directory unit responsible for the shared line in the caches, therefore reducing communication overhead; and
c. a compiler guarantees, by introducing synchronization actions, that any two load/store instructions, one of which is a store instruction, which can simultaneously arrive at a cache line, are independent, therefore allowing re-ordering of load/store instructions in the plurality of scalable networks, and simplifying cache hardware;
where the compiler automatically translates a single-threaded software program code fragment into the partitioned application-specific supercomputer functionally equivalent to the single-threaded software program code fragment, in part by creating one or more customized coherent cache memory hierarchies with the write-update protocol, and where each among the one or more customized coherent cache memory hierarchies with the write-update protocol has a minimum number of ports and data width per port for reducing memory area and power consumption.
1 Assignment
0 Petitions
Accused Products
Abstract
The invention comprises (i) a compilation method for automatically converting a single-threaded software program into an application-specific supercomputer, and (ii) the supercomputer system structure generated as a result of applying this method. The compilation method comprises: (a) Converting an arbitrary code fragment from the application into customized hardware whose execution is functionally equivalent to the software execution of the code fragment; and (b) Generating interfaces on the hardware and software parts of the application, which (i) Perform a software-to-hardware program state transfer at the entries of the code fragment; (ii) Perform a hardware-to-software program state transfer at the exits of the code fragment; and (iii) Maintain memory coherence between the software and hardware memories. If the resulting hardware design is large, it is divided into partitions such that each partition can fit into a single chip. Then, a single union chip is created which can realize any of the partitions.
22 Citations
2 Claims
-
1. A directory-mapped coherent cache memory hierarchy with a write-update protocol of a partitioned application-specific supercomputer comprising a memory including a plurality of ports and a cache, where:
-
a. whenever a store instruction is executed to a shared line in the cache, all copies of the shared line in other caches are automatically updated with the same store instruction; b. the automatic update is accomplished with message communication, over a plurality of scalable networks, only among caches having the shared line, and a directory unit responsible for the shared line in the caches, therefore reducing communication overhead; and c. a compiler guarantees, by introducing synchronization actions, that any two load/store instructions, one of which is a store instruction, which can simultaneously arrive at a cache line, are independent, therefore allowing re-ordering of load/store instructions in the plurality of scalable networks, and simplifying cache hardware; where the compiler automatically translates a single-threaded software program code fragment into the partitioned application-specific supercomputer functionally equivalent to the single-threaded software program code fragment, in part by creating one or more customized coherent cache memory hierarchies with the write-update protocol, and where each among the one or more customized coherent cache memory hierarchies with the write-update protocol has a minimum number of ports and data width per port for reducing memory area and power consumption. - View Dependent Claims (2)
-
Specification