System and Method for NUMA-Aware Locking Using Lock Cohorts
First Claim
1. A method, comprising:
- performing by a computer;
beginning execution of a multithreaded application that comprises one or more requests to acquire a shared lock, wherein the shared lock controls access to a critical section of code or a shared resource by concurrently executing threads of the application, and wherein only one thread can hold the shared lock at a time;
a thread of the application acquiring the shared lock, wherein the thread is executing on one of a plurality of processor cores in a cluster of processor cores that share a memory, and wherein the cluster of processor cores is one of a plurality of clusters of processor cores on which threads of the multithreaded application are executing;
in response to acquiring the shared lock, the thread;
accessing the critical section of code or shared resource; and
subsequent to said accessing;
determining whether any other threads of the application that are executing on a processor core in the cluster of processor cores are waiting to access the critical section of code or shared resource; and
in response to determining that at least one other thread of the application that is executing on a processor core in the cluster of processor cores is waiting to acquire the shared lock, passing ownership of a cluster-specific lock that is associated with the critical section of code or shared resource to another thread of the application that is executing on a processor core in the cluster of processor cores and that is waiting to access the critical section of code or shared resource without releasing the shared lock, wherein said passing allows the other thread to gain access to the critical section of code or shared resource.
1 Assignment
0 Petitions
Accused Products
Abstract
The system and methods described herein may be used to implement NUMA-aware locks that employ lock cohorting. These lock cohorting techniques may reduce the rate of lock migration by relaxing the order in which the lock schedules the execution of critical code sections by various threads, allowing lock ownership to remain resident on a single NUMA node longer than under strict FIFO ordering, thus reducing coherence traffic and improving aggregate performance. A NUMA-aware cohort lock may include a global shared lock that is thread-oblivious, and multiple node-level locks that provide cohort detection. The lock may be constructed from non-NUMA-aware components (e.g., spin-locks or queue locks) that are modified to provide thread-obliviousness and/or cohort detection. Lock ownership may be passed from one thread that holds the lock to another thread executing on the same NUMA node without releasing the global shared lock.
30 Citations
20 Claims
-
1. A method, comprising:
performing by a computer; beginning execution of a multithreaded application that comprises one or more requests to acquire a shared lock, wherein the shared lock controls access to a critical section of code or a shared resource by concurrently executing threads of the application, and wherein only one thread can hold the shared lock at a time; a thread of the application acquiring the shared lock, wherein the thread is executing on one of a plurality of processor cores in a cluster of processor cores that share a memory, and wherein the cluster of processor cores is one of a plurality of clusters of processor cores on which threads of the multithreaded application are executing; in response to acquiring the shared lock, the thread; accessing the critical section of code or shared resource; and subsequent to said accessing; determining whether any other threads of the application that are executing on a processor core in the cluster of processor cores are waiting to access the critical section of code or shared resource; and in response to determining that at least one other thread of the application that is executing on a processor core in the cluster of processor cores is waiting to acquire the shared lock, passing ownership of a cluster-specific lock that is associated with the critical section of code or shared resource to another thread of the application that is executing on a processor core in the cluster of processor cores and that is waiting to access the critical section of code or shared resource without releasing the shared lock, wherein said passing allows the other thread to gain access to the critical section of code or shared resource. - View Dependent Claims (2, 3, 4, 5, 6, 7, 8, 9, 10, 11)
-
12. A system, comprising:
-
a plurality of processor core clusters, each of which comprises two or more processor cores that support multithreading and that share a local memory; a system memory coupled to the plurality of processor core clusters; wherein the system memory stores program instructions that when executed on one or more processor cores in the plurality of processor core clusters cause the one or more processor cores to perform; beginning execution of a multithreaded application that comprises one or more requests to acquire a shared lock, wherein the shared lock controls access to a critical section of code or a shared resource by concurrently executing threads of the application, and wherein only one thread can hold the shared lock at a time; a thread of the application acquiring the shared lock, wherein the thread is executing on one of a plurality of processor cores in a cluster of processor cores that share a memory, and wherein the cluster of processor cores is one of two or more clusters of processor cores on which threads of the multithreaded application are executing; in response to acquiring the shared lock, the thread; accessing the critical section of code or shared resource; and subsequent to said accessing; determining whether any other threads of the application that are executing on a processor core in the cluster of processor cores are waiting to access the critical section of code or shared resource; and in response to determining that at least one other thread of the application that is executing on a processor core in the cluster of processor cores is waiting to acquire the shared lock, passing ownership of a cluster-specific lock that is associated with the critical section of code or shared resource to another thread of the application that is executing on a processor core in the cluster of processor cores and that is waiting to access the critical section of code or shared resource without releasing the shared lock, wherein said passing allows the other thread to gain access to the critical section of code or shared resource. - View Dependent Claims (13, 14, 15, 16)
-
-
17. A non-transitory, computer-readable storage medium storing program instructions that when executed on one or more computers cause the one or more computers to perform:
-
beginning execution of a multithreaded application that comprises one or more requests to acquire a shared lock, wherein the shared lock controls access to a critical section of code or a shared resource by concurrently executing threads of the application, and wherein only one thread can hold the shared lock at a time; a thread of the application acquiring the shared lock, wherein the thread is executing on one of a plurality of processor cores in a cluster of processor cores that share a memory, and wherein the cluster of processor cores is one of a plurality of clusters of processor cores on which threads of the multithreaded application are executing; in response to acquiring the shared lock, the thread; accessing the critical section of code or shared resource; and subsequent to said accessing; determining whether any other threads of the application that are executing on a processor core in the cluster of processor cores are waiting to access the critical section of code or shared resource; and in response to determining that at least one other thread of the application that is executing on a processor core in the cluster of processor cores is waiting to acquire the shared lock, passing ownership of a cluster-specific lock that is associated with the critical section of code or shared resource to another thread of the application that is executing on a processor core in the cluster of processor cores and that is waiting to access the critical section of code or shared resource without releasing the shared lock, wherein said passing allows the other thread to gain access to the critical section of code or shared resource. - View Dependent Claims (18, 19, 20)
-
Specification