FAST AND LINEARIZABLE CONCURRENT PRIORITY QUEUE VIA DYNAMIC AGGREGATION OF OPERATIONS
First Claim
1. A computer implemented system, comprising:
- a processor coupled to memory, the processor configured to execute multi-threaded application programs;
a multi-threaded application executing on the processor, the multi-threaded application program configured to utilize a concurrent entity, and wherein a plurality of threads of the multi-threaded application are configured to generate a plurality of operation nodes to operate upon the concurrent entity concurrently with other threads;
a synchronization and aggregation logic component coupled to the multi-threaded application and configured to accept operation nodes from the plurality of threads, each operation node corresponding to a single thread, the accepted operation nodes to be placed in a temporary list, the operation nodes defining an operation to perform on the concurrent entity, wherein only one thread, known as a handler thread, is permitted to operate on the temporary list to perform the operations on the concurrent entity, and wherein each thread is permitted to provide only one operation node to the temporary list and waits until the corresponding operation node has been processed by the handler thread before being permitted to provide another operation node to a second temporary list; and
the concurrent entity stored in the memory accessible to the multi-threaded application program.
2 Assignments
0 Petitions
Accused Products
Abstract
Embodiments of the invention improve parallel performance in multi-threaded applications by serializing concurrent priority queue operations to improve throughput. An embodiment uses a synchronization protocol and aggregation technique that enables a single thread to handle multiple operations in a cache-friendly fashion while threads awaiting the completion of those operations spin-wait on a local stack variable, i.e., the thread continues to poll the stack variable until it has been set or cleared appropriately, rather than rely on an interrupt notification. A technique for an enqueue/dequeue (push/pop) optimization uses re-ordering of aggregated operations to enable the execution of two operations for the price of one in some cases. Other embodiments are described and claimed.
-
Citations
29 Claims
-
1. A computer implemented system, comprising:
-
a processor coupled to memory, the processor configured to execute multi-threaded application programs; a multi-threaded application executing on the processor, the multi-threaded application program configured to utilize a concurrent entity, and wherein a plurality of threads of the multi-threaded application are configured to generate a plurality of operation nodes to operate upon the concurrent entity concurrently with other threads; a synchronization and aggregation logic component coupled to the multi-threaded application and configured to accept operation nodes from the plurality of threads, each operation node corresponding to a single thread, the accepted operation nodes to be placed in a temporary list, the operation nodes defining an operation to perform on the concurrent entity, wherein only one thread, known as a handler thread, is permitted to operate on the temporary list to perform the operations on the concurrent entity, and wherein each thread is permitted to provide only one operation node to the temporary list and waits until the corresponding operation node has been processed by the handler thread before being permitted to provide another operation node to a second temporary list; and the concurrent entity stored in the memory accessible to the multi-threaded application program. - View Dependent Claims (2, 3, 4, 5, 6, 7, 8)
-
-
9. A computer implemented system, comprising:
-
a processor coupled to memory, the processor configured to execute multi-threaded application programs; a multi-threaded application executing on the processor, the multi-threaded application program configured to utilize a concurrent priority queue stored in the memory and configured to accept enqueue and dequeue operations, and wherein a plurality of threads of the multi-threaded application are configured to generate a plurality of operation nodes to operate upon the concurrent priority queue concurrently with other threads, and wherein each operation node comprises one of either an enqueue or dequeue operation; an enqueue/dequeue optimization logic component coupled to the multi-threaded application and having access to the concurrent priority queue, wherein a handler thread is configured to operate on the temporary list to process both enqueue operation nodes as they apply to the concurrent priority queue as well as constant-time dequeue operations, prior to operating on non-constant-time dequeue operations, before re-sorting items in the concurrent priority queue, and wherein enqueue operations are performed before all non-constant time dequeue operations, and wherein the handler thread is further configured to provide a fully sorted concurrent priority queue to the multi-threaded application after performing all operation nodes and re-sorting the concurrent priority queue. - View Dependent Claims (10, 11, 12, 13, 14, 15, 16, 17)
-
-
18. A non-transitory machine accessible medium having instructions stored thereon, the instructions when executed on a machine, cause the machine to:
-
add operation nodes to a temporary list, by at least one thread of a plurality of threads executing in a multi-threaded application running on the machine; assign a first thread that added an operation node to the temporary list a role as handler thread for the temporary list; wait by the handler thread for a flag indicating processing of the temporary list is permitted to commence; retrieve the temporary list and generate an initially empty second temporary list, by the handler thread, in an atomic operation, when the flag indicates processing of the temporary list is permitted to commence, wherein the second temporary list is to receive operation nodes by at least one thread when the at least one thread does not have an unprocessed operation node on the retrieved temporary list; and process the temporary list by the handler thread into a concurrent entity. - View Dependent Claims (19, 20, 21, 22, 23, 24)
-
-
25. A non-transitory machine accessible medium having instructions stored thereon, the instructions when executed on a machine, cause the machine to:
-
process enqueue and dequeue operation nodes from a temporary list, the enqueue and dequeue operation nodes corresponding to operations to be performed on a concurrent priority queue associated with a multi-threaded application executing on the machine, the processing to be performed by a thread assigned to be a handler thread for the temporary list, and wherein the concurrent priority queue comprises an array data structure stored in memory, where, when non-empty, at least a portion of the array elements are sorted into a heap; process operation nodes in the temporary list by the handler thread, wherein enqueue operation nodes in the temporary list are added to the concurrent priority queue in constant time, and non-constant time dequeue operations in the temporary list are delayed until all constant-time enqueue and dequeue operation nodes in the temporary list have been processed by the handler thread; and re-sort the concurrent priority queue after the temporary list has been processed by the handler thread, when necessary. - View Dependent Claims (26, 27, 28, 29)
-
Specification