Graphics processing unit buffer management

US 9,256,915 B2
Filed: 01/23/2013
Issued: 02/09/2016
Est. Priority Date: 01/27/2012
Status: Active Grant

First Claim

Patent Images

1. A method for execution of data processing operations in a pipeline fashion, the method comprising:

executing a first thread on a first programmable compute unit of a shader processor of a graphics processing unit (GPU), wherein the shader processor includes a plurality of programmable compute units including the first programmable compute unit;

executing a second thread on a second programmable compute unit of the plurality of programmable compute units of the shader processor of the GPU;

receiving, directly with a management unit within an integrated circuit (IC) that includes the GPU, a request from the first programmable compute unit to store data produced by the execution of the first thread into a buffer in an integrated global memory external to the IC shared by the plurality of programmable compute units, wherein the data produced by the execution of the first thread is to be consumed by the second programmable compute unit executing the second thread, and wherein the buffer comprises a first-in-first-out (FIFO) buffer;

determining, directly with the management unit, a location within the buffer where the data produced by the execution of the first thread is to be stored; and

storing, with the IC, the data produced by the execution of the first thread in the determined location within the buffer.

View all claims

1 Assignment

Timeline View

Assignment View

0 Petitions

Accused Products

Abstract

The techniques are generally related to management of buffers with a management unit that resides within an integrated circuit that includes a graphics processing unit (GPU). The management unit may ensure proper access to the buffers by the programmable compute units of the GPU to allow the GPU to execute kernels on the programmable compute units in a pipeline fashion.

Citations

23 Claims

1. A method for execution of data processing operations in a pipeline fashion, the method comprising:
- executing a first thread on a first programmable compute unit of a shader processor of a graphics processing unit (GPU), wherein the shader processor includes a plurality of programmable compute units including the first programmable compute unit;
  
  executing a second thread on a second programmable compute unit of the plurality of programmable compute units of the shader processor of the GPU;
  
  receiving, directly with a management unit within an integrated circuit (IC) that includes the GPU, a request from the first programmable compute unit to store data produced by the execution of the first thread into a buffer in an integrated global memory external to the IC shared by the plurality of programmable compute units, wherein the data produced by the execution of the first thread is to be consumed by the second programmable compute unit executing the second thread, and wherein the buffer comprises a first-in-first-out (FIFO) buffer;
  
  determining, directly with the management unit, a location within the buffer where the data produced by the execution of the first thread is to be stored; and
  
  storing, with the IC, the data produced by the execution of the first thread in the determined location within the buffer.
- View Dependent Claims (2, 3, 4, 5, 6, 7, 8, 9, 10)
- - 2. The method of claim 1, further comprising:
    - storing, with the management unit, state information of the buffer within the IC, wherein the state information of the buffer includes one or more of a starting address of the buffer, an ending address of the buffer, an address within the buffer where produced data is to be stored, and an address within the buffer where data is to be retrieved,wherein determining the location within the buffer comprises determining the location within the buffer for where the data produced by the execution of the first thread is to be stored based on the stored state information of the buffer.
  - 3. The method of claim 1, further comprising:
    - receiving, with the management unit, a request from the second programmable compute unit executing the second thread to retrieve at least some of the data produced by the execution of the first thread; and
      
      determining, with the management unit, whether the data that is produced by the execution of the first thread is available for retrieval for consumption by the second programmable compute unit executing the second thread.
  - 4. The method of claim 3, wherein receiving the request from the second programmable compute unit comprises receiving the request from the second programmable compute unit at a same time, prior to, or after receiving the request from the first programmable compute unit to store data produced by the execution of the first thread.
  - 5. The method of claim 3, further comprising:
    - when the data requested by the second thread is not available for retrieval for consumption by the second programmable compute unit executing the second thread, indicating, with the management unit, to the second programmable compute unit to execute a third thread;
      
      indicating, with the management unit, to the second programmable compute unit when the data requested by the second thread is available for retrieval for consumption by the second programmable compute unit executing the second thread; and
      
      indicating, with the management unit, to the second programmable compute unit to execute the second thread to consume the data requested by the second thread when the data requested by the second thread is available for retrieval for consumption by the second programmable compute unit executing the second thread.
  - 6. The method of claim 3, further comprising:
    - retrieving, with the management unit, data from the global memory in addition to the data requested by the second thread; and
      
      storing, with the management unit, the data in addition to the data requested by the second thread in a cache within the IC.
  - 7. The method of claim 1, wherein executing the first thread comprises executing a producer thread of a kernel, and wherein executing the second thread comprises executing a consumer thread of the kernel.
  - 8. The method of claim 1, wherein executing the first thread comprises executing the first thread of a producer kernel, and wherein executing the second thread comprises executing a thread of a consumer kernel.
  - 9. The method of claim 1, wherein the GPU includes the management unit, and wherein the FIFO buffer comprises a ring buffer.
  - 10. The method of claim 1, wherein determining the location within the buffer comprises determining the location within the buffer for where the data produced by the execution of the first thread is to be stored without the first thread indicating the location of where the data is to be stored in the buffer.

11. An apparatus comprising:
- an integrated global memory shared by a plurality of programmable compute units that includes a buffer, wherein the buffer comprises a first-in-first-out (FIFO) buffer;
  
  an integrated circuit (IC) comprising;
  
  a graphics processing unit (GPU), the GPU comprising;
  
  the plurality of programmable compute units;
  
  a first programmable compute unit of the plurality of programmable compute units configured to execute a first thread; and
  
  a second programmable compute unit of the plurality of programmable compute units configured to execute a second thread; and
  
  a management unit configured to;
  
  directly receive a request from the first programmable compute unit to store data produced by the execution of the first thread into the buffer in the global memory, wherein the data produced by the execution of the first thread is to be consumed by the second programmable compute unit executing the second thread; and
  
  directly determine a location within the buffer where the data produced by the execution of the first thread is to be stored, wherein the IC is configured to store the data produced by the execution of the first thread in the determined location within the buffer.
- View Dependent Claims (12, 13, 14, 15, 16, 17, 18, 19, 20, 21)
- - 12. The apparatus of claim 11,wherein the management unit is configured to store state information of the buffer within the IC,wherein the state information of the buffer includes one or more of a starting address of the buffer, an ending address of the buffer, an address within the buffer where produced data is to be stored, and an address within the buffer where data is to be retrieved, andwherein the management unit is configured to determine the location within the buffer for where the data produced by the execution of the first thread is to be stored based on the stored state information of the buffer.
  - 13. The apparatus of claim 11, wherein the management unit is configured to:
    - receive a request from the second programmable compute unit executing the second thread to retrieve at least some of the data produced by the execution of the first thread; and
      
      determine whether the data that is produced by the execution of the first thread is available for retrieval for consumption by the second programmable compute unit executing the second thread.
  - 14. The apparatus of claim 13, wherein the management unit is configured to receive the request from the second programmable compute unit at a same time, prior to, or after receiving the request from the first programmable compute unit to store data produced by the execution of the first thread.
  - 15. The apparatus of claim 13, wherein the management unit is configured to:
    - when the data requested by the second thread is not available for retrieval for consumption by the second programmable compute unit executing the second thread, indicate to the second programmable compute unit to execute a third thread;
      
      indicate to the second programmable compute unit when the data requested by the second thread is available for retrieval for consumption by the second programmable compute unit executing the second thread; and
      
      indicate to the second programmable compute unit to execute the second thread to consume the data requested by the second thread when the data requested by the second thread is available for retrieval for consumption by the second programmable compute unit executing the second thread.
  - 16. The apparatus of claim 13, wherein the management unit is configured to:
    - retrieve, from the global memory, data in addition to the data requested by the second thread; and
      
      store the data in addition to the data requested by the second thread in a cache within the IC.
  - 17. The apparatus of claim 11, wherein the first thread comprises a producer thread of a kernel, and the second thread comprises a consumer thread of the kernel.
  - 18. The apparatus of claim 11, wherein the first thread comprises a thread of a producer kernel, and the second thread comprises a thread of a consumer kernel.
  - 19. The apparatus of claim 11, wherein the GPU includes the management unit, and wherein the FIFO buffer comprises a ring buffer.
  - 20. The apparatus of claim 11, wherein the management unit is configured to determine the location within the buffer for where the data produced by the execution of the first thread is to be stored without the first thread indicating the location of where the data is to be stored in the buffer.
  - 21. The apparatus of claim 11, wherein the apparatus comprises one of a video device, a set-top box, a wireless handset, a personal digital assistant, a desktop computer, a laptop computer, a gaming console, a video conferencing unit, and a tablet computing device.

22. An apparatus comprising:
- an integrated global memory shared by a plurality of programmable compute units that includes a buffer, wherein the buffer comprises a first-in-first-out (FIFO) buffer; and
  
  an integrated circuit (IC) comprising;
  
  a graphics processing unit (GPU) comprising;
  
  means for executing a first thread; and
  
  means for executing a second thread; and
  
  means for directly receiving a request from the means for executing the first thread to store data produced by the execution of the first thread into the buffer in the global memory, wherein the data produced by the execution of the first thread is to be consumed by the means for executing the second thread;
  
  means for directly determining a location within the buffer where the data produced by the means for executing the first thread is to be stored; and
  
  means for storing the data produced by the execution of the first thread in the determined location within the buffer.

23. A non-transitory computer-readable storage medium having instructions stored thereon that when executed cause one or more processors to:
- execute a first thread on a first programmable compute unit of a shader processor of a graphics processing unit (GPU) , wherein the shader processor includes a plurality of programmable compute units including the first programmable compute unit;
  
  execute a second thread on a second programmable compute unit of the plurality of programmable compute units of the shader processor of the GPU;
  
  receive, directly with a management unit within an integrated circuit (IC) that includes the GPU, a request from the first programmable compute unit to store data produced by the execution of the first thread into a buffer in an integrated global memory external to the IC shared by the plurality of programmable compute units, wherein the data produced by the execution of the first thread is to be consumed by the second programmable compute unit executing the second thread, and wherein the buffer comprises a first-in-first-out (FIFO) buffer;
  
  determine, directly with the management unit, a location within the buffer where the data produced by the execution of the first thread is to be stored; and
  
  store, with the IC, the data produced by the execution of the first thread in the determined location within the buffer.

Specification

Resources

Litigation Campaign Assessment

Current Assignee
Qualcomm, Inc.
Original Assignee
Qualcomm, Inc.
Inventors
Bourd, Alexei V., Goel, Vineet
Primary Examiner(s)
Tung, Kee M
Assistant Examiner(s)
Wang, Yuehan

Application Number

US13/747,947
Publication Number

US 20130194286A1
Time in Patent Office

1,112 Days
Field of Search

345/545
US Class Current

1/1
CPC Class Codes

G06F 9/5038   considering the execution o...

G06F 9/52   Program synchronisation; Mu...

G06F 9/544   Buffers; Shared memory; Pipes

G06T 1/20   Processor architectures; Pr...

G06T 1/60   Memory management

G09G 2360/10   Display system comprising a...

G09G 2360/121   using a cache memory

G09G 5/001   Arbitration of resources in...

G09G 5/363   Graphics controllers

Graphics processing unit buffer management

First Claim

1 Assignment

0 Petitions

Accused Products

Abstract

Citations

23 Claims

Specification

Solutions

Use Cases

Quick Links

Graphics processing unit buffer management

First Claim

1 Assignment

Subscription Required

Subscription Required

0 Petitions

Subscription Required

Accused Products

Subscription Required

Abstract

Citations

23 Claims

Specification

Subscription Required

Solutions

Use Cases

Quick Links