Method and apparatus for partitioning programs to balance memory latency
First Claim
Patent Images
1. A method of compiling code, comprising:
- partitioning instructions in the code among a plurality of processors based on memory access latency associated with the instructions by;
partitioning memory access dependence chains into an upstream stage by assigning a first number of desired upstream nodes to the upstream stage, and also assigning instructions in the code on which the first number of desired upstream nodes are dependent to the upstream stage, wherein the first number of desired upstream nodes is N/d where N is a length of the memory access dependence chain and d is a pipelining degree; and
partitioning the memory access dependence chains into a downstream stage by assigning a last number of desired downstream nodes to the downstream stage, and assigning instructions in the code which are dependent on the last number of desired downstream nodes to the downstream stage, wherein the last number of desired downstream nodes is N*(d−
1)/d;
performing the partitioning a plurality of times with subsequent partitioning being performed on the instructions assigned to the downstream stage.
1 Assignment
0 Petitions
Accused Products
Abstract
A method of compiling code that includes partitioning instructions in the code among a plurality of processors based on memory access latency associated with the instructions is disclosed. According to one aspect of the invention, partitioning instructions includes partitioning memory access dependence chains. Other embodiments are described and claimed.
37 Citations
15 Claims
-
1. A method of compiling code, comprising:
-
partitioning instructions in the code among a plurality of processors based on memory access latency associated with the instructions by; partitioning memory access dependence chains into an upstream stage by assigning a first number of desired upstream nodes to the upstream stage, and also assigning instructions in the code on which the first number of desired upstream nodes are dependent to the upstream stage, wherein the first number of desired upstream nodes is N/d where N is a length of the memory access dependence chain and d is a pipelining degree; and partitioning the memory access dependence chains into a downstream stage by assigning a last number of desired downstream nodes to the downstream stage, and assigning instructions in the code which are dependent on the last number of desired downstream nodes to the downstream stage, wherein the last number of desired downstream nodes is N*(d−
1)/d;performing the partitioning a plurality of times with subsequent partitioning being performed on the instructions assigned to the downstream stage. - View Dependent Claims (2, 3, 4, 5, 6)
-
-
7. An article of manufacture comprising a non-transitory machine accessible medium including sequences of instructions, the sequences of instructions including instructions which when executed cause the machine to perform:
-
partitioning instructions in code among a plurality of processors based on memory access latency associated with the instructions by; partitioning memory access dependence chains into an upstream stage by assigning a first number of desired upstream nodes to the upstream stage, and also assigning instructions in the code on which the first number of desired upstream nodes are dependent on to the upstream stage, wherein the first number of desired upstream nodes is N/d where N is a length of the memory access dependence chain and d is a pipelining degree; and partitioning the memory access dependence chains into a downstream stage by assigning a last number of desired downstream nodes to the downstream stage, and assigning instructions in the code which are dependent on the last number of desired downstream nodes to the downstream stage, wherein the last number of desired downstream nodes is N*(d−
1)/d;performing the partitioning a plurality of times with subsequent partitioning being performed on the instructions assigned to the downstream stage. - View Dependent Claims (8)
-
-
9. A code analysis unit implemented on a processor, comprising:
-
a dependence information unit to identify dependencies between instructions in code; and a code partitioning unit to partition instructions in the code into a plurality of pipeline stages to be executed by a plurality of processors based on memory access latency associated with the instructions by partitioning memory access dependence chains into an upstream stage by assigning a first number of desired upstream nodes to the upstream stage, and also assigning instructions in the code on which the first number of desired upstream nodes are dependent to the upstream stage, wherein the first number of desired upstream nodes is N/d where N is a length of the memory access dependence chain and d is a pipelining degree; and partitioning the memory access dependence chains into a downstream stage by assigning a last number of desired downstream nodes to the downstream stage, and assigning instructions in the code which are dependent on the last number of desired downstream nodes to the downstream stage, wherein the last number of desired downstream nodes is N*(d−
1)/d;performing the partitioning a plurality of times with subsequent partitioning being performed on the instructions assigned to the downstream stage. - View Dependent Claims (10, 11, 12, 13, 14, 15)
-
Specification