Memory access assignment for parallel processing architectures
First Claim
1. A computer-implemented method for configuring a system that does not include hardware support for providing cache coherence among respective caches of computation units by maintaining consistency of data stored in the respective caches according to a coherence protocol, the system comprising a plurality of such computation units interconnected by an interconnection network, the method comprising:
- forming subsets of instructions corresponding to different portions of a program, the subsets of instructions being related according to a control flow graph;
forming one or more memory analysis regions that include one or more of the subsets of instructions, where each subset of instructions is included in a single memory analysis region;
analyzing, by one or more computers, each memory analysis region to partition memory objects and instructions that access the memory objects into equivalence classes such that instructions within an equivalence class only access objects in the same equivalence class;
inserting, by the one or more computers, initialization instructions before each memory analysis region identified as a critical region and inserting, by the one or more computers, finalization instructions after each memory analysis region identified as a critical region, the initialization instructions comprising instructions for copying memory objects into private memory accessible to a single computation unit, and the finalization instructions comprising instructions for copying memory objects out of the private memory; and
assigning, by the one or more computers, sets of instructions corresponding to the memory analysis regions to respective computation units for execution on the assigned computation units, with sets of instructions that include memory access instructions belonging to different equivalence classes assigned to different ones of the computation units.
9 Assignments
0 Petitions
Accused Products
Abstract
A system comprises a plurality of computation units interconnected by an interconnection network. A method for configuring the system comprises forming subsets of instructions corresponding to different portions of a program, the subsets of instructions being related according to a control flow graph; forming one or more memory analysis regions that include one or more of the subsets of instructions, where each subset of instructions is included in a single memory analysis region; analyzing each memory analysis region to partition memory objects and instructions that access the memory objects into equivalence classes such that instructions within an equivalence class only access objects in the same equivalence class; and assigning memory access instructions a given equivalence class to one of the computation units for execution on the assigned computation unit.
-
Citations
18 Claims
-
1. A computer-implemented method for configuring a system that does not include hardware support for providing cache coherence among respective caches of computation units by maintaining consistency of data stored in the respective caches according to a coherence protocol, the system comprising a plurality of such computation units interconnected by an interconnection network, the method comprising:
-
forming subsets of instructions corresponding to different portions of a program, the subsets of instructions being related according to a control flow graph; forming one or more memory analysis regions that include one or more of the subsets of instructions, where each subset of instructions is included in a single memory analysis region; analyzing, by one or more computers, each memory analysis region to partition memory objects and instructions that access the memory objects into equivalence classes such that instructions within an equivalence class only access objects in the same equivalence class; inserting, by the one or more computers, initialization instructions before each memory analysis region identified as a critical region and inserting, by the one or more computers, finalization instructions after each memory analysis region identified as a critical region, the initialization instructions comprising instructions for copying memory objects into private memory accessible to a single computation unit, and the finalization instructions comprising instructions for copying memory objects out of the private memory; and assigning, by the one or more computers, sets of instructions corresponding to the memory analysis regions to respective computation units for execution on the assigned computation units, with sets of instructions that include memory access instructions belonging to different equivalence classes assigned to different ones of the computation units. - View Dependent Claims (2, 3, 4, 5)
-
-
6. A computer program, stored on a computer-readable device, for configuring a system that does not include hardware support for providing cache coherence among respective caches of computation units by maintaining consistency of data stored in the respective caches according to a coherence protocol, the system comprising a plurality of such computation units interconnected by an interconnection network, the computer program comprising instructions for causing a computer system to:
-
form subsets of instructions corresponding to different portions of a program, the subsets of instructions being related according to a control flow graph; form one or more memory analysis regions that include one or more of the subsets of instructions, where each subset of instructions is included in a single memory analysis region; analyze each memory analysis region to partition memory objects and instructions that access the memory objects into equivalence classes such that instructions within an equivalence class only access objects in the same equivalence class; insert initialization instructions before each memory analysis region identified as a critical region and inserting finalization instructions after each memory analysis region identified as a critical region, the initialization instructions comprising instructions for copying memory objects into private memory accessible to a single computation unit, and the finalization instructions comprising instructions for copying memory objects out of the private memory; and assign sets of instructions corresponding to the memory analysis regions to respective computation units for execution on the assigned computation units, with sets of instructions that include memory access instructions belonging to different equivalence classes assigned to different ones of the computation units.
-
-
7. A system, comprising:
-
a plurality of interconnected processor devices that do not include hardware support for providing cache coherence among respective caches of the processor devices by maintaining consistency of data stored in the respective caches according to a coherence protocol; and information for configuring the processor devices by forming subsets of instructions corresponding to different portions of a program, the subsets of instructions being related according to a control flow graph; forming one or more memory analysis regions that include one or more of the subsets of instructions, where each subset of instructions is included in a single memory analysis region; analyzing each memory analysis region to partition memory objects and instructions that access the memory objects into equivalence classes such that instructions within an equivalence class only access objects in the same equivalence class; inserting initialization instructions before each memory analysis region identified as a critical region and inserting finalization instructions after each memory analysis region identified as a critical region, the initialization instructions comprising instructions for copying memory objects into private memory accessible to a single computation unit, and the finalization instructions comprising instructions for copying memory objects out of the private memory; and assigning sets of instructions corresponding to the memory analysis regions to respective processor devices for execution on the assigned processor device, with sets of instructions that include memory access instructions belonging to different equivalence classes assigned to different ones of the processor devices. - View Dependent Claims (8, 9)
-
-
10. A computer-implemented method for configuring a system that includes hardware support for providing cache coherence among respective caches of computation units by maintaining consistency of data stored in the respective caches according to a coherence protocol, the system comprising a plurality of such computation units interconnected by an interconnection network, the method comprising:
-
forming subsets of instructions corresponding to different portions of a program, the subsets of instructions being related according to a control flow graph; forming one or more memory analysis regions that include one or more of the subsets of instructions, where each subset of instructions is included in a single memory analysis region; analyzing, by one or more computers, each memory analysis region to partition memory objects and instructions that access the memory objects into equivalence classes such that instructions within an equivalence class only access objects in the same equivalence class; for arrays that have been split into sub-arrays, remapping, by the one or more computers, the sub-arrays to different cache lines of multiple cache lines, each cache line corresponding to a set of memory addresses that are copied into the cache or evicted from the cache together; and assigning, by the one or more computers, sets of instructions corresponding to the memory analysis regions to respective computation units for execution on the assigned computation units, with sets of instructions that include memory access instructions belonging to different equivalence classes assigned to different ones of the computation units. - View Dependent Claims (11)
-
-
12. A computer program, stored on a computer-readable device, for configuring a system that includes hardware support for providing cache coherence among respective caches of computation units by maintaining consistency of data stored in the respective caches according to a coherence protocol, the system comprising a plurality of such computation units interconnected by an interconnection network, the computer program comprising instructions for causing a computer system to:
-
form subsets of instructions corresponding to different portions of a program, the subsets of instructions being related according to a control flow graph; form one or more memory analysis regions that include one or more of the subsets of instructions, where each subset of instructions is included in a single memory analysis region; analyze each memory analysis region to partition memory objects and instructions that access the memory objects into equivalence classes such that instructions within an equivalence class only access objects in the same equivalence class; for arrays that have been split into sub-arrays, remap the sub-arrays to different cache lines of multiple cache lines, each cache line corresponding to a set of memory addresses that are copied into the cache or evicted from the cache together; and assign sets of instructions corresponding to the memory analysis regions to respective computation units for execution on the assigned computation units, with sets of instructions that include memory access instructions belonging to different equivalence classes assigned to different ones of the computation units.
-
-
13. A system, comprising:
-
a plurality of interconnected processor devices that include hardware support for providing cache coherence among respective caches of the processor devices by maintaining consistency of data stored in the respective caches according to a coherence protocol; and information for configuring the processor devices by forming subsets of instructions corresponding to different portions of a program, the subsets of instructions being related according to a control flow graph; forming one or more memory analysis regions that include one or more of the subsets of instructions, where each subset of instructions is included in a single memory analysis region; analyzing each memory analysis region to partition memory objects and instructions that access the memory objects into equivalence classes such that instructions within an equivalence class only access objects in the same equivalence class; for arrays that have been split into sub-arrays, remapping the sub-arrays to different cache lines of multiple cache lines, each cache line corresponding to a set of memory addresses that are copied into the cache or evicted from the cache together; and assigning sets of instructions corresponding to the memory analysis regions to respective processor devices for execution on the assigned processor device, with sets of instructions that include memory access instructions belonging to different equivalence classes assigned to different ones of the processor devices. - View Dependent Claims (14, 15)
-
-
16. A computer-implemented method for configuring a system comprising a plurality of computation units interconnected by an interconnection network, the method comprising:
-
forming subsets of instructions corresponding to different portions of a program, the subsets of instructions being related according to a control flow graph; forming one or more memory analysis regions that include one or more of the subsets of instructions, where each subset of instructions is included in a single memory analysis region; analyzing, by one or more computers, each memory analysis region to partition memory objects and instructions that access the memory objects into equivalence classes such that instructions within an equivalence class only access objects in the same equivalence class; identifying, by the one or more computers, one or more memory analysis regions as critical regions based on identifying information in the memory analysis regions that indicates the memory analysis region is performance critical; receiving, by the one or more computers, cache coherence information that indicates whether or not the system includes circuitry to provide cache coherence among respective caches of the computation units by maintaining consistency of data stored in the respective caches according to a coherence protocol; inserting, by the one or more computers, initialization instructions including instructions for copying memory objects into private memory accessible to a single computation unit before each memory analysis region identified as a critical region and inserting, by the one or more computers, finalization instructions including instructions for copying memory objects out of the private memory after each memory analysis region identified as a critical region, if the cache coherence information indicates that the system does not include circuitry to provide cache coherence, but not if the cache coherence information indicates that the system does include circuitry to provide cache coherence; and assigning sets of instructions corresponding to the memory analysis regions to respective computation units for execution on the assigned computation units, with sets of instructions that include memory access instructions belong to different equivalence classes assigned to different ones of the computation units. - View Dependent Claims (17, 18)
-
Specification