System and method for implementing hierarchical queue-based locks using flat combining

US 8,458,721 B2
Filed: 06/02/2011
Issued: 06/04/2013
Est. Priority Date: 06/02/2011
Status: Active Grant

First Claim

Patent Images

1. A method, comprising:

performing by a computer;

beginning execution of a multithreaded application that comprises one or more requests to acquire a shared lock;

a thread of the application executing on one of a plurality of processor cores in a cluster of processor cores that share a memory posting a request to acquire the shared lock in a publication list for the cluster using a non-atomic operation write operation, wherein the publication list comprises a plurality of nodes, each of which is associated with a respective thread that accesses the shared lock, and wherein the cluster of processor cores is one of a plurality of clusters of processor cores;

the thread building a local lock acquisition request queue comprising the node associated with the thread and one or more other nodes of the publication list for the cluster, wherein each of the one or more other nodes is associated with a respective thread that has posted a request to acquire the shared lock, and wherein the local lock acquisition request queue is an ordered queue in which each node of the queue comprises a pointer to its successor node in the queue;

the thread splicing the local lock acquisition queue into a global lock acquisition request queue for the shared lock as a sub-queue of the global lock acquisition request queue, wherein the global lock acquisition request queue comprises one or more other sub-queues, each of which comprises one or more nodes associated with threads executing on a processor core in a particular cluster of processor cores;

the thread waiting for an indication that it has been granted the shared lock; and

in response to the thread receiving an indication that it has been granted the shared lock, the thread accessing a critical section or shared resource that is protected by the shared lock.

View all claims

1 Assignment

Timeline View

Assignment View

0 Petitions

Accused Products

Abstract

The system and methods described herein may be used to implement a scalable, hierarchal, queue-based lock using flat combining. A thread executing on a processor core in a cluster of cores that share a memory may post a request to acquire a shared lock in a node of a publication list for the cluster using a non-atomic operation. A combiner thread may build an ordered (logical) local request queue that includes its own node and nodes of other threads (in the cluster) that include lock requests. The combiner thread may splice the local request queue into a (logical) global request queue for the shared lock as a sub-queue. A thread whose request has been posted in a node that has been combined into a local sub-queue and spliced into the global request queue may spin on a lock ownership indicator in its node until it is granted the shared lock.

Citations

20 Claims

1. A method, comprising:
- performing by a computer;
  
  beginning execution of a multithreaded application that comprises one or more requests to acquire a shared lock;
  
  a thread of the application executing on one of a plurality of processor cores in a cluster of processor cores that share a memory posting a request to acquire the shared lock in a publication list for the cluster using a non-atomic operation write operation, wherein the publication list comprises a plurality of nodes, each of which is associated with a respective thread that accesses the shared lock, and wherein the cluster of processor cores is one of a plurality of clusters of processor cores;
  
  the thread building a local lock acquisition request queue comprising the node associated with the thread and one or more other nodes of the publication list for the cluster, wherein each of the one or more other nodes is associated with a respective thread that has posted a request to acquire the shared lock, and wherein the local lock acquisition request queue is an ordered queue in which each node of the queue comprises a pointer to its successor node in the queue;
  
  the thread splicing the local lock acquisition queue into a global lock acquisition request queue for the shared lock as a sub-queue of the global lock acquisition request queue, wherein the global lock acquisition request queue comprises one or more other sub-queues, each of which comprises one or more nodes associated with threads executing on a processor core in a particular cluster of processor cores;
  
  the thread waiting for an indication that it has been granted the shared lock; and
  
  in response to the thread receiving an indication that it has been granted the shared lock, the thread accessing a critical section or shared resource that is protected by the shared lock.
- View Dependent Claims (2, 3, 4, 5, 6, 7, 8, 9)
- - 2. The method of claim 1, where said waiting comprises the thread repeatedly reading the value of an indicator of lock ownership in the node associated with the thread until the value of the indicator indicates that the thread has been granted ownership of the shared lock.
  - 3. The method of claim 1, wherein the local lock acquisition request queue is a logical queue in which the nodes of the queue are shared with the nodes of the publication list for the cluster.
  - 4. The method of claim 1, wherein the global lock acquisition request queue is a logical queue in which the nodes of the queue are shared with the nodes of one or more publication lists for one or more clusters of processor cores.
  - 5. The method of claim 1,wherein said posting a request comprises the thread writing a particular value to an indicator of a pending request in the node associated with the thread;
    - andwherein said building comprises the thread traversing the publication list to identify the one or more other nodes that are associated with threads that have posted requests to acquire the shared lock.
  - 6. The method of claim 5, wherein said traversing is performed two or more times prior to said splicing, and wherein the number of times said traversing is performed is dependent on a heuristic that considers the effectiveness of one or more previous local lock acquisition request queue building operations performed by the thread.
  - 7. The method of claim 1, wherein said splicing comprises atomically replacing the value of a pointer that identifies the tail node of the global lock acquisition request queue with the value of a pointer that identifies the tail node of the local lock acquisition request queue.
  - 8. The method of claim 1, wherein said splicing comprises replacing the value of a pointer in the tail node of the global lock acquisition request queue that identifies the next node in the global lock acquisition request queue with the value of a pointer that identifies the head node of the local lock acquisition request queue.
  - 9. The method of claim 1, further comprising:
    - a second thread of the application determining whether one or more local lock acquisition request queues previously built by the second thread were of a length shorter than a pre-determined minimum target length; and
      
      in response to determining that the one or more local lock acquisition request queues previously built by the second thread were of a length shorter than the pre-determined minimum target length, the second thread posting a request to acquire the shared lock directly to the global lock acquisition request queue;
      
      wherein posting the request directly to the global lock acquisition request queue comprises the second thread using an atomic operation to insert a node associated with the second thread as a new tail node of the global lock acquisition request queue.

10. A system comprising:
- a plurality of processor core clusters, each of which comprises two or more processor cores that support multithreading and that share a local memory;
  
  a system memory coupled to the one or more processors;
  
  wherein the system memory stores program instructions that when executed on one or more processor cores in the plurality of processor core clusters causes the one or more processor cores to perform;
  
  a thread executing on one of the plurality of processor cores in a given cluster of processor cores posting a request to acquire a shared lock in a publication list for the given cluster using a non-atomic operation write operation, wherein the publication list comprises a plurality of nodes, each of which is associated with a respective thread that accesses the shared lock;
  
  the thread building a local lock acquisition request queue comprising the node associated with the thread and one or more other nodes of the publication list for the given cluster, wherein each of the one or more other nodes is associated with a respective thread that has posted a request to acquire the shared lock, and wherein the local lock acquisition request queue is an ordered queue in which each node of the queue comprises a pointer to its successor node in the queue;
  
  the thread splicing the local lock acquisition queue into a global lock acquisition request queue for the shared lock as a sub-queue of the global lock acquisition request queue, wherein the global lock acquisition request queue comprises one or more other sub-queues, each of which comprises one or more nodes associated with threads executing on a processor core in a particular cluster of processor cores;
  
  the thread waiting for an indication that it has been granted the shared lock; and
  
  in response to the thread receiving an indication that it has been granted the shared lock, the thread accessing a critical section or shared resource that is protected by the shared lock.
- View Dependent Claims (11, 12, 13, 14, 15)
- - 11. The system of claim 10, where said waiting comprises the thread repeatedly reading the value of an indicator of lock ownership in the node associated with the thread until the value of the indicator indicates that the thread has been granted ownership of the shared lock.
  - 12. The system of claim 10, wherein at least one of the local lock acquisition request queue and the global lock acquisition request queue is a logical queue in which the nodes of the queue are shared with the nodes of the publication list for the given cluster.
  - 13. The system of claim 10,wherein said posting a request comprises the thread writing a particular value to an indicator of a pending request in the node associated with the thread;
    - wherein said building comprises the thread traversing the publication list one or more times to identify the one or more other nodes that are associated with threads that have posted requests to acquire the shared lock; and
      
      wherein the number of times said traversing is performed is dependent on a heuristic that considers the effectiveness of one or more previous local lock acquisition request queue building operations performed by the thread.
  - 14. The system of claim 10, wherein said splicing comprises:
    - atomically replacing the value of a pointer that identifies the tail node of the global lock acquisition request queue with the value of a pointer that identifies the tail node of the local lock acquisition request queue; and
      
      replacing the value of a pointer in the tail node of the global lock acquisition request queue that identifies the next node in the global lock acquisition request queue with the value of a pointer that identifies the head node of the local lock acquisition request queue.
  - 15. The system of claim 10, wherein when executed on the one or more processor cores in the plurality of processor core clusters the program instructions further cause the one or more processor cores to perform:
    - a second thread of the application determining whether one or more local lock acquisition request queues previously built by the second thread were of a length shorter than a pre-determined minimum target length; and
      
      in response to determining that the one or more local lock acquisition request queues previously built by the second thread were of a length shorter than the pre-determined minimum target length, the second thread posting a request to acquire the shared lock directly to the global lock acquisition request queue;
      
      wherein posting the request directly to the global lock acquisition request queue comprises the second thread using an atomic operation to insert a node associated with the second thread as a new tail node of the global lock acquisition request queue.

16. A non-transitory, computer readable storage medium storing program instructions that when executed on one or more computers cause the one or more computers to perform:
- beginning execution of a multithreaded application that comprises one or more requests to acquire a shared lock;
  
  a thread of the application executing on one of a plurality of processor cores in a cluster of processor cores that share a memory posting a request to acquire the shared lock in a publication list for the cluster using a non-atomic operation write operation, wherein the publication list comprises a plurality of nodes, each of which is associated with a respective thread that accesses the shared lock, and wherein the cluster of processor cores is one of a plurality of clusters of processor cores;
  
  the thread building a local lock acquisition request queue comprising the node associated with the thread and one or more other nodes of the publication list for the cluster, wherein each of the one or more other nodes is associated with a respective thread that has posted a request to acquire the shared lock, and wherein the local lock acquisition request queue is an ordered queue in which each node of the queue comprises a pointer to its successor node in the queue;
  
  the thread splicing the local lock acquisition queue into a global lock acquisition request queue for the shared lock as a sub-queue of the global lock acquisition request queue, wherein the global lock acquisition request queue comprises one or more other sub-queues, each of which comprises one or more nodes associated with threads executing on a processor core in a particular cluster of processor cores;
  
  the thread waiting for an indication that it has been granted the shared lock; and
  
  in response to the thread receiving an indication that it has been granted the shared lock, the thread accessing a critical section or shared resource that is protected by the shared lock.
- View Dependent Claims (17, 18, 19, 20)
- - 17. The storage medium of claim 16, where said waiting comprises the thread repeatedly reading the value of an indicator of lock ownership in the node associated with the thread until the value of the indicator indicates that the thread has been granted ownership of the shared lock.
  - 18. The storage medium of claim 16, wherein at least one of the local lock acquisition request queue and the global lock acquisition request queue is a logical queue in which the nodes of the queue are shared with the nodes of the publication list for the given cluster.
  - 19. The storage medium of claim 16,wherein said posting a request comprises the thread writing a particular value to an indicator of a pending request in the node associated with the thread;
    - wherein said building comprises the thread traversing the publication list one or more times to identify the one or more other nodes that are associated with threads that have posted requests to acquire the shared lock; and
      
      wherein the number of times said traversing is performed is dependent on a heuristic that considers the effectiveness of one or more previous local lock acquisition request queue building operations performed by the thread.
  - 20. The storage medium of claim 16, wherein said splicing comprises:
    - atomically replacing the value of a pointer that identifies the tail node of the global lock acquisition request queue with the value of a pointer that identifies the tail node of the local lock acquisition request queue; and
      
      replacing the value of a pointer in the tail node of the global lock acquisition request queue that identifies the next node in the global lock acquisition request queue with the value of a pointer that identifies the head node of the local lock acquisition request queue.

Specification

Resources

Litigation Campaign Assessment

Current Assignee
Oracle International Corporation (Oracle Corporation)
Original Assignee
Oracle International Corporation (Oracle Corporation)
Inventors
Marathe, Virendra J., Shavit, Nir N., Dice, David
Primary Examiner(s)
NGUYEN, VAN H

Application Number

US13/152,079
Publication Number

US 20120311606A1
Time in Patent Office

733 Days
Field of Search

718/100, 718/107, 719/310, 719/312, 719/313, 719/314
US Class Current

718/107
CPC Class Codes

G06F 9/526 Mutual exclusion algorithms

System and method for implementing hierarchical queue-based locks using flat combining

First Claim

1 Assignment

0 Petitions

Accused Products

Abstract

Citations

20 Claims

Specification

Solutions

Use Cases

Quick Links

System and method for implementing hierarchical queue-based locks using flat combining

First Claim

1 Assignment

Subscription Required

Subscription Required

0 Petitions

Subscription Required

Accused Products

Subscription Required

Abstract

Citations

20 Claims

Specification

Subscription Required

Solutions

Use Cases

Quick Links