Apparatus and method for low-overhead synchronous page table updates
First Claim
1. A processor comprising:
- a plurality of cores to execute instructions and process data;
one or more translation lookaside buffers (TLBs) comprising a plurality of entries to cache virtual-to-physical address translations usable by at least one of the plurality of cores when executing the instructions;
a page table entry (PTE) invalidation circuit to execute a PTE invalidate instruction on a first core to invalidate a first PTE in TLBs of other cores, the PTE invalidation circuit, responsive to execution of the PTE invalidate instruction, to responsively determine a number of other TLBs of other cores which need to be notified of the PTE invalidation, transmit PTE invalidate messages to the other TLBs, and wait for responses;
locking circuitry to allow a thread to lock the first PTE in the first TLB to ensure that only one thread can modify the first PTE at a time; and
an invalidation PTE state machine circuit to be programmed with a count value initially set to the number of other TLBs which need to be notified, the invalidation PTE state machine circuit to decrement the count value upon receiving each response from each of the other TLBs, the locking circuitry to release the lock when the count value has been decremented to a threshold value.
2 Assignments
0 Petitions
Accused Products
Abstract
An apparatus and method are described for low overhead synchronous page table updates. For example, one embodiment of a processor comprises: a set of one or more cores to execute instructions and process data; a translation lookaside buffer (TLB) comprising a plurality of entries to cache virtual-to-physical address translations usable by the set of one or more cores when executing the instructions; locking circuitry to allow a thread to lock a first page table entry (PTE) in the TLB to ensure that only one thread can modify the first PTE at a time, wherein the TLB is to modify the first PTE upon the thread acquiring the lock; a PTE invalidation circuit to execute a PTE invalidate instruction on a first core to invalidate the first PTE in other TLBs of other cores, the PTE invalidation circuit, responsive to execution of the PTE invalidate instruction, to responsively determine a number of other TLBs of other cores which need to be notified of the PTE invalidation, transmit PTE invalidate messages to the other TLBs, and wait for responses; and the locking circuitry to release the lock on the first PTE responsive to receiving responses from all of the other TLBs.
-
Citations
19 Claims
-
1. A processor comprising:
-
a plurality of cores to execute instructions and process data; one or more translation lookaside buffers (TLBs) comprising a plurality of entries to cache virtual-to-physical address translations usable by at least one of the plurality of cores when executing the instructions; a page table entry (PTE) invalidation circuit to execute a PTE invalidate instruction on a first core to invalidate a first PTE in TLBs of other cores, the PTE invalidation circuit, responsive to execution of the PTE invalidate instruction, to responsively determine a number of other TLBs of other cores which need to be notified of the PTE invalidation, transmit PTE invalidate messages to the other TLBs, and wait for responses; locking circuitry to allow a thread to lock the first PTE in the first TLB to ensure that only one thread can modify the first PTE at a time; and an invalidation PTE state machine circuit to be programmed with a count value initially set to the number of other TLBs which need to be notified, the invalidation PTE state machine circuit to decrement the count value upon receiving each response from each of the other TLBs, the locking circuitry to release the lock when the count value has been decremented to a threshold value. - View Dependent Claims (2, 3, 4, 5, 6, 7)
-
-
8. A method comprising:
-
caching a plurality of virtual-to-physical address translations in a translation lookaside buffer (TLB) usable by the set of one or more cores when executing the instructions; locking a first page table entry (PTE) in the TLB to ensure that only one thread can modify the first PTE at a time, wherein the TLB is to modify the first PTE upon acquiring the lock; executing a PTE invalidate instruction on a first core to invalidate the first PTE in other TLBs of other cores, the PTE invalidation circuit, responsive to execution of the PTE invalidate instruction, to responsively determine a number of other TLBs of other cores which need to be notified of the PTE invalidation, transmit PTE invalidate messages to the other TLBs, and wait for responses; and releasing the lock on the first PTE responsive to receiving responses from all of the other TLBs, wherein an invalidation PTE state machine circuit to be programmed with a count value initially set to the number of other TLBs which need to be notified, the invalidation PTE state machine circuit to decrement the count value upon receiving each response from each of the other TLBs, and wherein the lock on the first PTE is released when the count value has been decremented to a threshold value. - View Dependent Claims (9, 10, 11, 12, 13)
-
-
14. A system comprising:
-
a memory to store instructions and data; a processor to execute the instructions and process the data; a graphics processor to perform graphics operations in response to graphics instructions; a network interface to receive and transmit data over a network; an interface for receiving user input from a mouse or cursor control device, the plurality of cores executing the instructions and processing the data responsive to the user input; the processor comprising; a plurality of cores to execute instructions and process data; a translation lookaside buffer (TLB) comprising a plurality of entries to cache virtual-to-physical address translations usable by at least one of the plurality of cores when executing the instructions; locking circuitry to allow a thread to lock a first page table entry (PTE) in the TLB to ensure that only one thread can modify the first PTE at a time, wherein the TLB is to modify the first PTE upon the thread acquiring the lock; a PTE invalidation circuit to execute a PTE invalidate instruction on a first core to invalidate the first PTE in other TLBs of other cores, the PTE invalidation circuit, responsive to execution of the PTE invalidate instruction, to responsively determine a number of other TLBs of other cores which need to be notified of the PTE invalidation, transmit PTE invalidate messages to the other TLBs, and wait for responses; the locking circuitry to release the lock on the first PTE responsive to receiving responses from all of the other TLBs; and an invalidation PTE state machine circuit to be programmed with a count value initially set to the number of other TLBs which need to be notified, the invalidation PTE state machine circuit to decrement the count value upon receiving each response from each of the other TLBs, the locking circuitry to release the lock when the count value has been decremented to a threshold value. - View Dependent Claims (15, 16, 17, 18, 19)
-
Specification