System and method for implementing hierarchical queue-based locks using flat combining
First Claim
1. A method, comprising:
- performing by a computer;
beginning execution of a multithreaded application that comprises one or more requests to acquire a shared lock;
a thread of the application executing on one of a plurality of processor cores in a cluster of processor cores that share a memory posting a request to acquire the shared lock in a publication list for the cluster using a non-atomic operation write operation, wherein the publication list comprises a plurality of nodes, each of which is associated with a respective thread that accesses the shared lock, and wherein the cluster of processor cores is one of a plurality of clusters of processor cores;
the thread building a local lock acquisition request queue comprising the node associated with the thread and one or more other nodes of the publication list for the cluster, wherein each of the one or more other nodes is associated with a respective thread that has posted a request to acquire the shared lock, and wherein the local lock acquisition request queue is an ordered queue in which each node of the queue comprises a pointer to its successor node in the queue;
the thread splicing the local lock acquisition queue into a global lock acquisition request queue for the shared lock as a sub-queue of the global lock acquisition request queue, wherein the global lock acquisition request queue comprises one or more other sub-queues, each of which comprises one or more nodes associated with threads executing on a processor core in a particular cluster of processor cores;
the thread waiting for an indication that it has been granted the shared lock; and
in response to the thread receiving an indication that it has been granted the shared lock, the thread accessing a critical section or shared resource that is protected by the shared lock.
1 Assignment
0 Petitions
Accused Products
Abstract
The system and methods described herein may be used to implement a scalable, hierarchal, queue-based lock using flat combining. A thread executing on a processor core in a cluster of cores that share a memory may post a request to acquire a shared lock in a node of a publication list for the cluster using a non-atomic operation. A combiner thread may build an ordered (logical) local request queue that includes its own node and nodes of other threads (in the cluster) that include lock requests. The combiner thread may splice the local request queue into a (logical) global request queue for the shared lock as a sub-queue. A thread whose request has been posted in a node that has been combined into a local sub-queue and spliced into the global request queue may spin on a lock ownership indicator in its node until it is granted the shared lock.
-
Citations
20 Claims
-
1. A method, comprising:
performing by a computer; beginning execution of a multithreaded application that comprises one or more requests to acquire a shared lock; a thread of the application executing on one of a plurality of processor cores in a cluster of processor cores that share a memory posting a request to acquire the shared lock in a publication list for the cluster using a non-atomic operation write operation, wherein the publication list comprises a plurality of nodes, each of which is associated with a respective thread that accesses the shared lock, and wherein the cluster of processor cores is one of a plurality of clusters of processor cores; the thread building a local lock acquisition request queue comprising the node associated with the thread and one or more other nodes of the publication list for the cluster, wherein each of the one or more other nodes is associated with a respective thread that has posted a request to acquire the shared lock, and wherein the local lock acquisition request queue is an ordered queue in which each node of the queue comprises a pointer to its successor node in the queue; the thread splicing the local lock acquisition queue into a global lock acquisition request queue for the shared lock as a sub-queue of the global lock acquisition request queue, wherein the global lock acquisition request queue comprises one or more other sub-queues, each of which comprises one or more nodes associated with threads executing on a processor core in a particular cluster of processor cores; the thread waiting for an indication that it has been granted the shared lock; and in response to the thread receiving an indication that it has been granted the shared lock, the thread accessing a critical section or shared resource that is protected by the shared lock. - View Dependent Claims (2, 3, 4, 5, 6, 7, 8, 9)
-
10. A system comprising:
-
a plurality of processor core clusters, each of which comprises two or more processor cores that support multithreading and that share a local memory; a system memory coupled to the one or more processors; wherein the system memory stores program instructions that when executed on one or more processor cores in the plurality of processor core clusters causes the one or more processor cores to perform; a thread executing on one of the plurality of processor cores in a given cluster of processor cores posting a request to acquire a shared lock in a publication list for the given cluster using a non-atomic operation write operation, wherein the publication list comprises a plurality of nodes, each of which is associated with a respective thread that accesses the shared lock; the thread building a local lock acquisition request queue comprising the node associated with the thread and one or more other nodes of the publication list for the given cluster, wherein each of the one or more other nodes is associated with a respective thread that has posted a request to acquire the shared lock, and wherein the local lock acquisition request queue is an ordered queue in which each node of the queue comprises a pointer to its successor node in the queue; the thread splicing the local lock acquisition queue into a global lock acquisition request queue for the shared lock as a sub-queue of the global lock acquisition request queue, wherein the global lock acquisition request queue comprises one or more other sub-queues, each of which comprises one or more nodes associated with threads executing on a processor core in a particular cluster of processor cores; the thread waiting for an indication that it has been granted the shared lock; and in response to the thread receiving an indication that it has been granted the shared lock, the thread accessing a critical section or shared resource that is protected by the shared lock. - View Dependent Claims (11, 12, 13, 14, 15)
-
-
16. A non-transitory, computer readable storage medium storing program instructions that when executed on one or more computers cause the one or more computers to perform:
-
beginning execution of a multithreaded application that comprises one or more requests to acquire a shared lock; a thread of the application executing on one of a plurality of processor cores in a cluster of processor cores that share a memory posting a request to acquire the shared lock in a publication list for the cluster using a non-atomic operation write operation, wherein the publication list comprises a plurality of nodes, each of which is associated with a respective thread that accesses the shared lock, and wherein the cluster of processor cores is one of a plurality of clusters of processor cores; the thread building a local lock acquisition request queue comprising the node associated with the thread and one or more other nodes of the publication list for the cluster, wherein each of the one or more other nodes is associated with a respective thread that has posted a request to acquire the shared lock, and wherein the local lock acquisition request queue is an ordered queue in which each node of the queue comprises a pointer to its successor node in the queue; the thread splicing the local lock acquisition queue into a global lock acquisition request queue for the shared lock as a sub-queue of the global lock acquisition request queue, wherein the global lock acquisition request queue comprises one or more other sub-queues, each of which comprises one or more nodes associated with threads executing on a processor core in a particular cluster of processor cores; the thread waiting for an indication that it has been granted the shared lock; and in response to the thread receiving an indication that it has been granted the shared lock, the thread accessing a critical section or shared resource that is protected by the shared lock. - View Dependent Claims (17, 18, 19, 20)
-
Specification