Efficient explicit data prefetching analysis and code generation in a low-level optimizer for inserting prefetch instructions into loops of applications
First Claim
1. A compiler, comprising:
- means in a low level optimizer for analyzing and efficiently inserting explicit data prefetch instructions into loops of applications;
subscript expression analysis means for determining data prefetching requirements;
means for recognizing cache line reuse patterns across loop iterations to eliminate unnecessary prefetch instructions; and
means for limiting insertion of explicit prefetch instructions to situations where a lower bound on an achievable loop iteration latency is unlikely to be increased as a result of said prefetch instruction insertion.
3 Assignments
0 Petitions
Accused Products
Abstract
A compiler that facilitates efficient insertion of explicit data prefetch instructions into loop structures within applications uses simple address expression analysis to determine data prefetching requirements. Analysis and explicit data cache prefetch instruction insertion are performed by the compiler in a machine-instruction level optimizer to provide access to more accurate expected loop iteration latency information. Such prefetch instruction insertion strategy tolerates worst-case alignment of user data structures relative to data cache lines. Execution profiles from previous runs of an application are exploited in the insertion of prefetch instructions into loops with internal control flow. Cache line reuse patterns across loop iterations are recognized to eliminate unnecessary prefetch instructions. The prefetch insertion algorithm is integrated with other low-level optimization phases, such as loop unrolling, register reassociation, and instruction scheduling. An alternative embodiment of the compiler limits the insertion of explicit prefetch instructions to those situations where the lower bound on the achievable loop iteration latency is unlikely to be increased as a result of the insertion.
-
Citations
17 Claims
-
1. A compiler, comprising:
-
means in a low level optimizer for analyzing and efficiently inserting explicit data prefetch instructions into loops of applications; subscript expression analysis means for determining data prefetching requirements; means for recognizing cache line reuse patterns across loop iterations to eliminate unnecessary prefetch instructions; and means for limiting insertion of explicit prefetch instructions to situations where a lower bound on an achievable loop iteration latency is unlikely to be increased as a result of said prefetch instruction insertion. - View Dependent Claims (2, 3, 4, 5)
-
-
6. A method for mitigating or eliminating cache misses in a low level optimizer, comprising the steps of:
-
performing loop body analysis; unrolling loops to reduce prefetch instruction overhead; identifying uniformly generated equivalence classes of memory references in a code stream, where said equivalence classes represent disjoint sets of memory references occurring in a loop whose address expressions can be expressed as a linear function of the same basic loop induction variable and are known to differ only by a compile time constant, allowing the detection of group spatial and group temporal locality among said different memory references; computing an effective memory stride for each of the equivalence classes; determining the number of prefetch instructions needed for full cache miss coverage for each equivalence class, where the number of prefetch instructions that needs to be inserted is a function of the style of prefetching desired, including dumb prefetching that inserts an explicit prefetch instruction for each memory reference, baseline prefetching that inserts as many prefetch instructions as possible without affecting the resource minimum loop iteration latency, and selective prefetching that inserts as many prefetch instructions as are required to ensure full cache miss coverage, exploiting any group-spatial or group-temporal locality that may be apparent among memory references within a uniformly generated equivalence class; and inserting prefetch instructions identified into said code stream. - View Dependent Claims (7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17)
-
Specification