Speculative pre-flush of data in an out-of-order execution processor system
First Claim
1. An apparatus for minimizing cache coherency check latency in an out of order instruction execution system having a plurality of processors, comprising:
- at least one cache coherency check mechanism associated with a first one of said plurality of processors, said at least one cache coherency check mechanism being configured to output a presence signal indicating that a first data line being requested by a second one of said plurality of processors is present in a cache memory associated with said first one of said plurality of processors;
at least one pre-flush slot configured to, upon receipt of said presence signal, determine at least one additional data line to be pre-flushed from said cache memory associated with said first one of said plurality of processors to said second one of said plurality of processors, and a logic associated with said at least one pre-flush slot, said logic configured to provide an indication whether said at least one additional data line is already being flushed from said cache memory.
2 Assignments
0 Petitions
Accused Products
Abstract
Speculative pre-fetching and pre-flushing of additional cache lines minimize cache miss latency and coherency check latency of an out of order instruction execution processor. A pre-fetch/pre-flush slot (DPRESLOT) is provided in a memory queue (MQUEUE) of the out-of-order execution processor. The DPRESLOT monitors the transactions between a system interface, e.g., the system bus, and an address reorder buffer slot (ARBSLOT) and/or between the system interface and a cache coherency check slot (CCCSLOT). When a cache miss is detected, the DPRESLOT causes one or more cache lines in addition to the data line, which caused the current cache miss, to be pre-fetched from the memory hierarchy into the cache memory (DCACHE) in anticipation that the additional data would be required in the near future. When a cache write back is detected as a result of a cache coherency check, the DPRESLOT causes one or more cache lines, in addition to the data line currently being written back, to be pre-flushed out to the memory hierarchy from the respective cache memory (DCACHE) of the processor that owns the line, in anticipation that the additional data would be required by the requesting processor in the near future. A logic included in the DPRESLOT prevents a cache miss request for the additional data when another request has already been made for the data.
37 Citations
18 Claims
-
1. An apparatus for minimizing cache coherency check latency in an out of order instruction execution system having a plurality of processors, comprising:
-
at least one cache coherency check mechanism associated with a first one of said plurality of processors, said at least one cache coherency check mechanism being configured to output a presence signal indicating that a first data line being requested by a second one of said plurality of processors is present in a cache memory associated with said first one of said plurality of processors;
at least one pre-flush slot configured to, upon receipt of said presence signal, determine at least one additional data line to be pre-flushed from said cache memory associated with said first one of said plurality of processors to said second one of said plurality of processors, and a logic associated with said at least one pre-flush slot, said logic configured to provide an indication whether said at least one additional data line is already being flushed from said cache memory. - View Dependent Claims (2, 3, 4, 5, 6, 7, 8, 9, 10)
an adjacent address logic configured to provide one or more additional addresses corresponding to said at least one additional data line, said at least one additional data line having a memory location adjacent to said first data line.
-
-
3. The apparatus for minimizing cache coherency check latency according to claim 2, wherein:
said adjacent address logic receives a first address corresponding to said first data line, and provides said one or more additional addresses by inverting one or more bits of said first address.
-
4. The apparatus for minimizing cache coherency check latency according to claim 3, wherein:
said one or more bits of said first address comprises a least significant bit of said first address.
-
5. The apparatus for minimizing cache coherency check latency according to claim 2, further comprising:
-
a busy latch having a set input and a clear input, said busy latch being configured to output a busy signal, and said busy signal being active when said set input is triggered, and inactive when said clear input is triggered; and
a register configured to store a cache index and a tag, both of which being derived from an address received from said adjacent address logic, said register receiving said address from said adjacent address logic upon receipt of an update signal, said update signal being produced by inverting said busy signal.
-
-
6. The apparatus for minimizing cache coherency check latency according to claim 5, further comprising:
a decode logic for receiving a transaction type, said decode logic being configured to trigger said set input of said busy latch when said received transaction type indicates a receipt of said presence signal.
-
7. The apparatus for minimizing cache miss latency according to claim 6, wherein:
said at least one pre-flush slot is configured to determine whether said at least one additional data line is present in said cache memory when said busy signal is active.
-
8. The apparatus for minimizing cache miss latency according to claim 7, wherein:
said at least one cache coherency check mechanism is configured to output said presence signal when said first data line has been flushed to said second one of said plurality processor from said cache memory.
-
9. The apparatus for minimizing cache miss latency according to claim 7, wherein:
said at least one pre-flush slot is configured to cause said at least one additional data line to be flushed to said second one of said plurality processor from said cache memory when said busy signal is active.
-
10. The apparatus for minimizing cache miss latency according to claim 9, wherein:
said at least one pre-flush slot is configured to cause said at least one additional data line to be flushed to said second one of said plurality processors from said cache memory if said at least one additional data line is determined to be present in said cache memory.
-
11. A method of minimizing cache coherency check latency in an out of order instruction execution system having a plurality of processors, comprising:
-
detecting a request for access to a first data line from a memory hierarchy, said request being made by a first one of said plurality of processors;
determining whether said first data line is present in a cache memory associated with a second one of said plurality of processors;
calculating an address of at least one additional data line to be pre-flushed from said cache memory to said second one of said plurality of processors; and
determining whether a previously made request for said at least one additional data line from said cache memory is pending. - View Dependent Claims (12, 13, 14, 15, 16, 17, 18)
inverting one or more bits of an address of said first data line.
-
-
13. The method of minimizing cache coherency check latency in accordance with claim 12, wherein:
said one or more bits comprises a least significant bit.
-
14. The method of minimizing cache coherency check latency in accordance with claim 11, further comprising:
if said previously made request is pending, preventing flushing of said at least one additional data line to said second one of said plurality of processors.
-
15. The method of minimizing cache coherency check latency in accordance with claim 14, further comprising:
if said previously made request is not pending, issuing a request for said at least one additional data line to be flushed to said second one of said plurality of processors.
-
16. The method of minimizing cache coherency check latency in accordance with claim 14, further comprising:
determining whether said at least one additional data line is present in said cache memory.
-
17. The method of minimizing cache coherency check latency in accordance with claim 16, further comprising:
if said at least one additional data line is present in said cache memory, issuing a request for said at least one additional data line to be flushed to said second one of said plurality of processors.
-
18. The method of minimizing cache coherency check latency in accordance with claim 16, further comprising:
if said at least one additional data line is not present in said cache memory, preventing said at least one additional data line from being flushed to said second one of said plurality of processors.
Specification