Empirically based dynamic control of transmission of victim cache lateral castouts
First Claim
1. A method of data processing in a data processing system including a plurality of lower level caches and a plurality of architecturally distributed system memories coupled by an interconnect fabric, wherein the plurality of lower level caches includes first, second and third lower level caches, wherein the first lower level cache is associated with a first processing unit having a first processor core and an associated first upper level cache and the second lower level cache is associated with a second processing unit having a second processor core and an associated second upper level cache, said method comprising:
- in response to a data request, the first processing unit;
selecting a victim cache line to be castout from the first lower level cache;
selecting a target lower level cache among the plurality of lower level caches, wherein said selecting a target lower level cache comprises selecting a target lower level cache based upon architectural proximity of the target lower level cache to a home system memory among the plurality of system memories to which an address of the victim cache line is assigned;
the first processing unit thereafter issuing a lateral castout (LCO) command on the interconnect fabric, wherein the LCO command identifies the victim cache line to be castout from the first lower level cache and indicates that the target lower level cache is a single intended destination of the victim cache line among the plurality of lower level caches; and
in response to a coherence response to the LCO command indicating success of the LCO command, removing the victim cache line from the first lower level cache and holding the victim cache line in the second lower level cache.
1 Assignment
0 Petitions
Accused Products
Abstract
In response to a data request, a victim cache line is selected for castout from a lower level cache, and a target lower level cache of one of the plurality of processing units is selected. A determination is made whether the selected target lower level cache has provided more than a threshold number of retry responses to lateral castout (LCO) commands of the first lower level cache, and if so, a different target lower level cache is selected. The first processing unit thereafter issues a LCO command on the interconnect fabric. The LCO command identifies the victim cache line to be castout and indicates that the target lower level cache is an intended destination of the victim cache line. In response to a successful coherence response to the LCO command, the victim cache line is removed from the first lower level cache and held in the second lower level cache.
-
Citations
44 Claims
-
1. A method of data processing in a data processing system including a plurality of lower level caches and a plurality of architecturally distributed system memories coupled by an interconnect fabric, wherein the plurality of lower level caches includes first, second and third lower level caches, wherein the first lower level cache is associated with a first processing unit having a first processor core and an associated first upper level cache and the second lower level cache is associated with a second processing unit having a second processor core and an associated second upper level cache, said method comprising:
-
in response to a data request, the first processing unit; selecting a victim cache line to be castout from the first lower level cache; selecting a target lower level cache among the plurality of lower level caches, wherein said selecting a target lower level cache comprises selecting a target lower level cache based upon architectural proximity of the target lower level cache to a home system memory among the plurality of system memories to which an address of the victim cache line is assigned; the first processing unit thereafter issuing a lateral castout (LCO) command on the interconnect fabric, wherein the LCO command identifies the victim cache line to be castout from the first lower level cache and indicates that the target lower level cache is a single intended destination of the victim cache line among the plurality of lower level caches; and in response to a coherence response to the LCO command indicating success of the LCO command, removing the victim cache line from the first lower level cache and holding the victim cache line in the second lower level cache. - View Dependent Claims (2, 3, 4, 5, 6, 7, 8)
-
-
9. A data processing system, comprising:
-
an interconnect fabric; a plurality of processing units coupled to the interconnect fabric, wherein the plurality of processing units are supported by a plurality of lower level caches including first, second and third lower level caches, wherein the first lower level cache is associated with a first processing unit having a first processor core and an associated first upper level cache and the second lower level cache is associated with a second processing unit having a second processor core and an associated second upper level cache; and a plurality of architecturally distributed system memories coupled to the interconnect fabric, wherein the plurality of architecturally distributed system memories includes a home system memory that is assigned a plurality of addresses including an address; wherein the first processing unit, in response to a data request, selects a victim cache line associated with the address to be castout from the first lower level cache, selects a target lower level cache among the plurality of lower level caches based upon architectural proximity of the target lower level cache to the home system memory to which the address of the victim cache line is assigned, and thereafter issues a lateral castout (LCO) command on the interconnect fabric, the LCO command identifying the victim cache line to be castout from the first lower level cache and indicating that the target lower level cache is a single intended destination of the victim cache line among the plurality of lower level caches; and wherein responsive to a coherence response to the LCO command indicating success of the LCO command, the first processing unit removes the victim cache line from the first lower level cache and the second lower level cache holds the victim cache line. - View Dependent Claims (10, 11, 12, 13, 14, 15, 16)
-
-
17. A processing unit for a data processing system including a plurality of architecturally distributed system memories including a home system memory, a plurality of processing units coupled by an interconnect fabric, and a plurality of lower level caches supporting the plurality of processing units, wherein the home system memory is assigned a plurality of addresses including an address, the processing unit comprising:
-
the processing unit has a first processor core and associated first upper and first lower level caches; wherein the processing unit, in response to a data request, selects a victim cache line associated with the address to be castout from the first lower level cache, selects a target lower level cache among the plurality of lower level caches based upon architectural proximity of the target lower level cache to the home system memory to which the address of the victim cache line is assigned, and thereafter issues a lateral castout (LCO) command on the interconnect fabric, the LCO command identifying the victim cache line to be castout from the first lower level cache and indicating that the target lower level cache is a single intended destination of the victim cache line among the plurality of lower level caches; and wherein responsive to a coherence response to the LCO command indicating success of the LCO command, the first processing unit removes the victim cache line from the first lower level cache for storage in a second lower level cache. - View Dependent Claims (18, 19, 20, 21, 22, 23)
-
-
24. A method of data processing in a data processing system including a plurality of processing units and a system memory coupled by an interconnect fabric, wherein each of the plurality of processing units includes a processor core and associated upper and lower level caches, said method comprising:
-
in response to a data request of the processor core of a first processing unit among the plurality of processing units, selecting a victim cache line to be removed from the lower level cache of the first processing unit; selecting among a lateral castout (LCO) from the lower level cache of the first processing unit to another lower level cache and a castout (CO) from the lower level cache of the first processing unit to the system memory, wherein the selecting comprises selecting based upon empirical success of the upper level cache of the first processing unit in obtaining requested data by intervention from cache memories of other of the plurality of processing units rather than the system memory; in response to selecting an LCO, performing an LCO of the victim cache line to a lower level cache of one of the plurality of processing units by issuing a LCO command on the interconnect fabric; and in response to selecting a CO, performing a CO of the victim cache line to the system memory. - View Dependent Claims (25, 26, 27, 28, 29, 30)
-
-
31. A data processing system, comprising:
-
a plurality of processing units and a system memory coupled by an interconnect fabric, wherein each of the plurality of processing units includes a processor core and associated upper and lower level caches, said plurality of processing units including a first processing unit; wherein the lower level cache of the first processing unit, responsive to a data request of the processor core of a first processing unit, selects a victim cache line to be removed from the lower level cache of the first processing unit and selects among a lateral castout (LCO) to another lower level cache and a castout (CO) to the system memory, wherein the lower level cache of the first processing unit selects either the LCO or CO based upon empirical success of the upper level cache of the first processing unit in obtaining requested data by intervention from cache memories of other of the plurality of processing units rather than the system memory; and wherein the lower level cache of the first processing unit, responsive to selecting an LCO, performs an LCO of the victim cache line to a lower level cache of one of the plurality of processing units by issuing a LCO command on the interconnect fabric and, responsive to selecting a CO, performs a CO of the victim cache line to the system memory. - View Dependent Claims (32, 33, 34, 35, 36, 37)
-
-
38. A processing unit for a data processing system having a plurality of processing units and a system memory coupled by an interconnect fabric, wherein each of the plurality of processing units includes a processor core and associated upper and lower level caches, said processing unit comprising:
-
a processor core; an upper level cache coupled to the processor core; and a lower level cache coupled to the upper level cache; wherein the lower level cache, responsive to a data request of the processor core, selects a victim cache line to be removed from the lower level cache and selects among a lateral castout (LCO) to another lower level cache and a castout (CO) to the system memory, wherein the lower level cache selects either the LCO or CO based upon empirical success of the upper level cache in obtaining requested data by intervention from cache memories of other of the plurality of processing units rather than the system memory; and wherein the lower level cache, responsive to selecting an LCO, performs an LCO of the victim cache line to a lower level cache of one of the plurality of processing units by issuing a LCO command on the interconnect fabric and, responsive to selecting a CO, performs a CO of the victim cache line to the system memory. - View Dependent Claims (39, 40, 41, 42, 43, 44)
-
Specification