Parallel priority queue utilizing parallel heap on many-core processors for accelerating priority-queue-based applications

US 10,095,556 B2
Filed: 12/19/2013
Issued: 10/09/2018
Est. Priority Date: 12/20/2012
Status: Active Grant

First Claim

Patent Images

1. A system, comprising:

a host processor comprising a central processing unit (CPU);

a graphics processing unit (GPU) comprising a many-core architecture;

kernel code executable by the GPU that, when executed by the GPU, causes the GPU to;

execute a parallel heap manager and a priority queue application concurrently in the GPU by assigning at least one kernel function of the parallel heap manager to a first stream and at least one kernel function of the priority queue application to a second stream;

implement, by the priority queue application in the first stream, a priority queue as a parallel heap where a plurality of operations performed on the priority queue are performed in parallel on the GPU; and

maintain, by the parallel heap manager in the second stream, an order of priority as a plurality of queue entries are inserted and deleted from the priority queue; and

host code executable by the host processor that, when executed, causes the host processor to;

synchronize, by a controller implemented by the host processor, operations of the priority queue application and the parallel heap manager using a global barrier.

View all claims

1 Assignment

Timeline View

Assignment View

0 Petitions

Accused Products

Abstract

Disclosed are various embodiments for a parallel priority queue implemented on one or more many-core processors and/or multi-core processors such as those in general-purpose graphics processing units (GPGPUs). According to various embodiments, a priority may be determined according to a timestamp of an item, such as an event or an entry, in a priority queue. A priority queue interface may comprise functions to insert and remove entries from the priority queue. Priority order of the entries may be maintained as the entries are inserted and removed from the queue.

8 Citations

View as Search Results

20 Claims

1. A system, comprising:
- a host processor comprising a central processing unit (CPU);
  
  a graphics processing unit (GPU) comprising a many-core architecture;
  
  kernel code executable by the GPU that, when executed by the GPU, causes the GPU to;
  
  execute a parallel heap manager and a priority queue application concurrently in the GPU by assigning at least one kernel function of the parallel heap manager to a first stream and at least one kernel function of the priority queue application to a second stream;
  
  implement, by the priority queue application in the first stream, a priority queue as a parallel heap where a plurality of operations performed on the priority queue are performed in parallel on the GPU; and
  
  maintain, by the parallel heap manager in the second stream, an order of priority as a plurality of queue entries are inserted and deleted from the priority queue; and
  
  host code executable by the host processor that, when executed, causes the host processor to;
  
  synchronize, by a controller implemented by the host processor, operations of the priority queue application and the parallel heap manager using a global barrier.
- View Dependent Claims (2, 3, 4, 5, 6, 7, 8, 9, 10, 11)
- - 2. The system of claim 1, wherein the GPU further comprises a plurality of streaming multiprocessors (SPs) employed in a single instruction multiple thread (SIMT) architecture.
  - 3. The system of claim 2, wherein the parallel heap further comprises a plurality of parallel heaps, wherein each of the parallel heaps corresponds to one of the plurality of SPs for processing.
  - 4. The system of claim 1, wherein the kernel code and the host code are implemented in a compute unified device architecture (CUDA).
  - 5. The system of claim 1, wherein the GPU further comprises a general computing graphics processing unit (GCGPU) comprising the many-core architecture.
  - 6. The system of claim 1, wherein the priority is determined for each of the plurality of queue entries according to a time stamp for each of the plurality of queue entries.
  - 7. The system of claim 1, wherein the GPU further comprises a plurality of streaming multi processors (SPs) employed in a single instruction multiple thread (SIMT) architecture.
  - 8. The system of claim 1, wherein the plurality of operations synchronized include an insert operation and a delete operation.
  - 9. The system of claim 1, further comprising:
    - kernel code executable by the GPU that, when executed, causes the GPU to notify, by the priority queue application, the controller when one or more new items are ready for insertion into the parallel heap; and
      
      host code executable by the CPU that, when executed, causes the CPU to;
      
      suspend, by the controller, the priority queue application after the notification;
      
      request, by the controller, the parallel heap manager executed in the GPU to merge the one or new items with items at a root node of the parallel heap;
      
      receive, by the controller, a plurality of R smallest items from the parallel heap manager after the parallel heap manager has completed the requested merge;
      
      resume, by the controller, the suspended priority queue application with the plurality of R smallest items; and
      
      request, by the controller, the parallel heap manager to begin a new delete-insert cycle to maintain the parallel heap.
  - 10. The system of claim 9, wherein resuming and the requesting of the new delete-insert cycle are performed concurrently.
  - 11. The system of claim 1, wherein the at least one kernel function of the parallel heap manager assigned to the first stream and the at least one kernel function of the priority queue application assigned to the second stream are executed in a first-in-first-out (FIFO) manner.

12. A method, comprising:
- implementing, by a graphics processing unit (GPU) comprising a plurality of streaming multi-core processors, a parallel heap manager and a priority queue application concurrently in the GPU, wherein at least one kernel function of the parallel heap manager is assigned to a first stream and at least one kernel function of the priority queue application is assigned to a second stream;
  
  implementing, by the priority queue application in the first stream, a priority queue as a parallel heap where a plurality of operations performed on the priority queue are performed in parallel;
  
  providing, by a host processor comprising a central processing unit (CPU) in communication with the GPU, a programmatic interface for retrieving, inserting, and deleting a plurality of queue entries in the parallel heap;
  
  maintaining, by a controller executed in the host processor, a priority order as the plurality of queue entries are inserted and deleted from the priority queue; and
  
  synchronizing, by the controlled executed in the host processor, a plurality of operations performed by the parallel heap manager and the priority queue application on the parallel heap of the GPU using a global barrier.
- View Dependent Claims (13, 14, 15, 16, 17, 18, 19, 20)
- - 13. The method of claim 12, wherein each of the plurality of streaming multi-core processors employs a single instruction multiple thread (SIMT) architecture.
  - 14. The method of claim 12, wherein the priority queue and the parallel heap are further synchronized using kernel synchronization.
  - 15. The method of claim 12, wherein the plurality of operations synchronized include an insert operation and a delete operation.
  - 16. The method of claim 12, further comprising:
    - notifying, by the priority queue application executed in the GPU, the controller when one or more new items are ready for insertion into the parallel heap;
      
      suspending, by the controller executed in the CPU, the priority queue application after the notification;
      
      requesting, by the controller executed in the CPU, the parallel heap manager executed in the GPU to merge the one or new items with items at a root node of the parallel heap;
      
      receiving, by the controller executed in the CPU, a plurality of R smallest items from the parallel heap manager after the parallel heap manager has completed the requested merge;
      
      resuming, by the controller executed in the CPU, the suspended priority queue application with the plurality of R smallest items; and
      
      requesting, by the controller executed in the CPU, the parallel heap manager to begin a new delete-insert cycle to maintain the parallel heap.
  - 17. The method of claim 16, wherein the resuming and the requesting of the new delete-insert cycle are performed concurrently.
  - 18. The method of claim 16, wherein the parallel heap further comprises a plurality of parallel heaps, wherein each of the parallel heaps corresponds to one of the plurality of SPs for processing.
  - 19. The method of claim 16, wherein the GPU further comprises a general computing graphics processing unit (GCGPU) comprising a many-core architecture.
  - 20. The method of claim 12, wherein the at least one kernel function of the parallel heap manager assigned to the first stream and the at least one kernel function of the priority queue application assigned to the second stream are executed in a first-in-first-out (FIFO) manner.

Specification

Resources

Litigation Campaign Assessment

Current Assignee
Georgia State University Research Foundation, Inc.
Original Assignee
Georgia State University Research Foundation, Inc.
Inventors
Prasad, Sushil K., He, Xi, Agarwal, Dinesh
Primary Examiner(s)
SALVUCCI, MATTHEW D

Application Number

US14/653,569
Publication Number

US 20150309846A1
Time in Patent Office

1,755 Days
Field of Search
US Class Current
CPC Class Codes

G06F 12/00   Accessing, addressing or al...

G06F 2209/548   Queue

G06F 7/24   Sorting, i.e. extracting da...

G06F 9/3887   controlled by a single inst...

G06F 9/4881   Scheduling strategies for d...

G06F 9/522   Barrier synchronisation

G06F 9/545   where tasks reside in diffe...

G06F 9/546   Message passing systems or ...

G06T 1/20   Processor architectures; Pr...

Parallel priority queue utilizing parallel heap on many-core processors for accelerating priority-queue-based applications

First Claim

1 Assignment

0 Petitions

Accused Products

Abstract

8 Citations

20 Claims

Specification

Solutions

Use Cases

Quick Links

Parallel priority queue utilizing parallel heap on many-core processors for accelerating priority-queue-based applications

First Claim

1 Assignment

Subscription Required

Subscription Required

0 Petitions

Subscription Required

Accused Products

Subscription Required

Abstract

8 Citations

20 Claims

Specification

Subscription Required

Solutions

Use Cases

Quick Links