Data reorganization in non-uniform cache access caches
First Claim
1. A method, comprising:
- setting a first plurality of direction bits for a first cache line of a first way, wherein data of the first cache line is located in a first bank of a plurality of banks of a non-uniform cache access (NUCA) cache, wherein further sets of the NUCA cache are horizontally distributed across the plurality of banks;
setting a second plurality of direction bits for a second cache line of a second way, wherein data of the second cache line is located in a second bank; and
moving data of the first cache line to the second bank and data of the second cache line to the first bank to reduce access latency between at least one of the first and second cache lines to at least one processor, wherein the moving is based upon a calculation which uses the first and second plurality of direction bits.
1 Assignment
0 Petitions
Accused Products
Abstract
Embodiments that dynamically reorganize data of cache lines in non-uniform cache access (NUCA) caches are contemplated. Various embodiments comprise a computing device, having one or more processors coupled with one or more NUCA cache elements. The NUCA cache elements may comprise one or more banks of cache memory, wherein ways of the cache are horizontally distributed across multiple banks. To improve access latency of the data by the processors, the computing devices may dynamically propagate cache lines into banks closer to the processors using the cache lines. To accomplish such dynamic reorganization, embodiments may maintain “direction” bits for cache lines. The direction bits may indicate to which processor the data should be moved. Further, embodiments may use the direction bits to make cache line movement decisions.
18 Citations
19 Claims
-
1. A method, comprising:
-
setting a first plurality of direction bits for a first cache line of a first way, wherein data of the first cache line is located in a first bank of a plurality of banks of a non-uniform cache access (NUCA) cache, wherein further sets of the NUCA cache are horizontally distributed across the plurality of banks; setting a second plurality of direction bits for a second cache line of a second way, wherein data of the second cache line is located in a second bank; and moving data of the first cache line to the second bank and data of the second cache line to the first bank to reduce access latency between at least one of the first and second cache lines to at least one processor, wherein the moving is based upon a calculation which uses the first and second plurality of direction bits. - View Dependent Claims (2, 3, 4, 5, 6, 7, 8)
-
-
9. An apparatus, comprising:
-
a latency module to determine access latencies between a plurality of processors and a plurality of banks of a non-uniform cache access (NUCA) cache, wherein ways are horizontally distributed across banks of the NUCA cache, wherein the latency module is configured to determine the access latencies via direction bits for cache lines of the ways; and a data movement module to move data of a first cache line from a first bank of the plurality of banks to a second bank of the plurality of banks and move data of a second cache line from the second bank to the first bank, wherein the data movement module is configured to move the first and second cache lines based upon the determined access latencies of the latency module. - View Dependent Claims (10, 11, 12, 13, 14, 15, 16, 17)
-
-
18. A system comprising:
-
a plurality of processors; a plurality of banks of a non-uniform cache access (NUCA) cache, wherein the plurality of processors are coupled to the NUCA cache and arranged to search ways of the NUCA cache, wherein further the ways are horizontally distributed across multiple banks of the NUCA cache; and a cache controller to evaluate access latencies between the plurality of processors and banks storing cache lines requested by the plurality of processors, wherein the cache controller is configured to swap data of the cache lines between pairs of banks only when at least one of the pairs of cache lines has been consecutively accessed by a processor, and evaluation of the access latencies comprises calculating access latencies for pairs of cache lines stored in pairs of banks to determine whether swapping the cache lines between the pairs of banks reduces access latency between at least one cache line of the pair and a processor that last requested the at least one cache line. - View Dependent Claims (19)
-
Specification