Software prefetch system and method for predetermining amount of streamed data
First Claim
1. A data processing system comprising:
- a processor having a load/store unit including a level one (L1) cache;
a prefetch engine coupled to the load/store unit;
a level two (L2) cache coupled to the L1 cache and the prefetch engine;
a level three (L3) cache coupled to the L2 cache and to the prefetch engine; and
circuitry operable for overriding the prefetch engine in response to a single instruction executed in the processor, wherein the overriding circuitry causes the prefetch engine to prefetch a predetermined stream of cache lines concurrently into the L1, L2, and L3 caches.
1 Assignment
0 Petitions
Accused Products
Abstract
A data processing system includes a processor having a first level cache and a prefetch engine. Coupled to the processor are a second level cache and a third level cache and a system memory. Prefetching of cache lines is performed into each of the first, second, and third level caches by the prefetch engine. Prefetch requests from the prefetch engine to the second and third level caches is performed over a private prefetch request bus, which is separate from the bus system that transfers data from the various cache levels to the processor. A software instruction is used to accelerate the prefetch process by overriding the normal functionality of the hardware prefetch engine. The instruction also limits the amount of data to be prefetched.
-
Citations
16 Claims
-
1. A data processing system comprising:
-
a processor having a load/store unit including a level one (L1) cache;
a prefetch engine coupled to the load/store unit;
a level two (L2) cache coupled to the L1 cache and the prefetch engine;
a level three (L3) cache coupled to the L2 cache and to the prefetch engine; and
circuitry operable for overriding the prefetch engine in response to a single instruction executed in the processor, wherein the overriding circuitry causes the prefetch engine to prefetch a predetermined stream of cache lines concurrently into the L1, L2, and L3 caches. - View Dependent Claims (2, 3, 4, 5, 6)
-
-
7. A multiprocessor system comprising:
-
a first processor including a first level one (L1) cache and a first prefetch engine;
a second processor including a second L1 cache and a second prefetch engine;
a level two (L2) cache shared by the first and second processors;
a fabric coupled to the L2 cache and adaptable for coupling to a third processor;
a level three (L3) cache;
system memory coupled to the L3 cache;
a first bus system for transferring data between the first L1 cache, L2 cache, and L3 cache and load misses from the first L1 cache to the first prefetch engine;
a second bus system for transferring data between the second L1 cache, L2 cache, and L3 cache and load misses from the second L1 cache to the second prefetch engine;
first circuitry operable for overriding the first prefetch engine in response to a first single instruction executed in the first processor, wherein the first overriding circuitry causes the first prefetch engine to prefetch a predetermined first stream of cache lines concurrently into the first L1, L2, and L3 caches. - View Dependent Claims (8, 9, 10)
circuitry for sending a prefetch request from the prefetch engine to the L2 cache over the private prefetch request bus; and
circuitry for prefetching cache line n+1 into the L1 cache in response to the prefetch request over the bus system.
-
-
9. The system as recited in claim 8, wherein the first prefetch engine further comprises:
circuitry for prefetching cache line n+2 into the L2 cache in response to the prefetch request.
-
10. The system as recited in claim 8, wherein the first prefetch engine further comprises:
circuitry for prefetching a block of N cache lines into the L3 cache in response to the prefetch request and the signal, where N is greater than 1.
-
11. In a data processing system comprising a processor having a load/store unit including a level one (L1) cache, a prefetch engine coupled to the load/store unit, a level two (L2) cache coupled to the L1 cache and the prefetch engine, and a level three (L3) cache coupled to the L2 cache and to the prefetch engine, a method comprising the steps of:
-
overriding the prefetch engine in response to a single instruction executed in the processor; and
in response to the overriding step, causing the prefetch engine to prefetch a predetermined stream of cache lines concurrently into the L1, L2, and L3 caches. - View Dependent Claims (12, 13, 14)
-
-
15. In a multiprocessor system comprising a first processor including a first level one (L1) cache and a first prefetch engine, a second processor including a second L1 cache and a second prefetch engine, a level two (L2) cache shared by the first and second processors, a fabric coupled to the L2 cache and adaptable for coupling to a third processor, a level three (L3) cache, system memory coupled to the L3 cache, a first bus system for transferring data between the first L1 cache, L2 cache, and L3 cache and load misses from the first L1 cache to the first prefetch engine, and a second bus system for transferring data between the second L1 cache, L2 cache, and L3 cache and load misses from the second L1 cache to the second prefetch engine, a method comprising the steps of:
-
overriding the first prefetch engine in response to a first single instruction executed in the first processor; and
in response to the overriding step, causing the first prefetch engine to prefetch a predetermined first stream of cache lines concurrently into the first L1, L2, and L3 caches. - View Dependent Claims (16)
sending a prefetch request from the prefetch engine to the L2 cache over the private prefetch request bus; and
prefetching cache line n+1 into the L1 cache in response to the prefetch request over the bus system.
-
Specification