Adaptively handling remote atomic execution based upon contention prediction
First Claim
Patent Images
1. A method comprising:
- receiving a first instruction for decoding in a first core of a processor and determining whether contention is predicted with respect to the first instruction; and
if no contention is predicted executing the first instruction in the first core, and if contention is predicted obtaining data associated with the first instruction in the first core and sending a remote execution request with the obtained data to a selected remote agent for execution of the first instruction, the prediction based at least in part on whether a contention vector of an entry of a directory associated with the obtained data indicates that a plurality of cores sought to atomically access the obtained data during a historical window including a first time period and a second time period, the contention vector having a plurality of core fields each associated with a core, each core field including a first indicator to indicate access of a cacheline via an atomic operation by the associated core within the first time period and a second indicator to indicate access of the cacheline via an atomic operation by the associated core within the second time period.
1 Assignment
0 Petitions
Accused Products
Abstract
In one embodiment, a method includes receiving an instruction for decoding in a processor core and dynamically handling the instruction with one of multiple behaviors based on whether contention is predicted. If no contention is predicted, the instruction is executed in the core, and if contention is predicted data associated with the instruction is marshaled and sent to a selected remote agent for execution. Other embodiments are described and claimed.
29 Citations
20 Claims
-
1. A method comprising:
-
receiving a first instruction for decoding in a first core of a processor and determining whether contention is predicted with respect to the first instruction; and if no contention is predicted executing the first instruction in the first core, and if contention is predicted obtaining data associated with the first instruction in the first core and sending a remote execution request with the obtained data to a selected remote agent for execution of the first instruction, the prediction based at least in part on whether a contention vector of an entry of a directory associated with the obtained data indicates that a plurality of cores sought to atomically access the obtained data during a historical window including a first time period and a second time period, the contention vector having a plurality of core fields each associated with a core, each core field including a first indicator to indicate access of a cacheline via an atomic operation by the associated core within the first time period and a second indicator to indicate access of the cacheline via an atomic operation by the associated core within the second time period. - View Dependent Claims (2, 3, 4, 5, 6, 7, 8, 9, 10)
-
-
11. A processor comprising:
-
a plurality of cores, each of the cores including; a front end to decode an instruction and to cause the instruction to be executed in an execution logic of the core if no contention is predicted, and if contention is predicted, to cause the instruction to be sent with data associated with the instruction to a selected remote agent for execution of the instruction; and a predictor having a plurality of entries each to store a prediction as to whether an instruction associated with the entry is to be contended during execution; and a directory having a plurality of entries each to store tag information regarding a cacheline present in one or more cores of the processor, each entry further including a contention vector having a plurality of core fields each associated with a core, each core field including a first indicator and a second indicator each to indicate access of the cacheline via an atomic operation by the associated core within a time period. - View Dependent Claims (12, 13, 14, 15, 16)
-
-
17. A system comprising:
-
a processor having a front end to decode an instruction and determine whether contention is predicted with respect to the instruction, a dynamic execution logic to execute the instruction if no contention is predicted and otherwise to obtain data associated with the instruction and send a remote execution request with the obtained data to a selected remote agent for execution of the instruction, and a predictor having a plurality of entries each to store a prediction as to whether an instruction associated with the entry is to be contended during execution; a directory having a plurality of entries each to store tag information regarding a cacheline stored in a cache of the system, each entry further including a contention vector having a plurality of core fields each associated with a core, each core field including a first indicator and a second indicator each to indicate access of the cacheline via an atomic operation by the associated core within a time period; and a dynamic random access memory (DRAM) coupled to the directory and the processor. - View Dependent Claims (18, 19, 20)
-
Specification