Graphics processing unit buffer management
First Claim
Patent Images
1. A method for execution of data processing operations in a pipeline fashion, the method comprising:
- executing a first thread on a first programmable compute unit of a shader processor of a graphics processing unit (GPU), wherein the shader processor includes a plurality of programmable compute units including the first programmable compute unit;
executing a second thread on a second programmable compute unit of the plurality of programmable compute units of the shader processor of the GPU;
receiving, directly with a management unit within an integrated circuit (IC) that includes the GPU, a request from the first programmable compute unit to store data produced by the execution of the first thread into a buffer in an integrated global memory external to the IC shared by the plurality of programmable compute units, wherein the data produced by the execution of the first thread is to be consumed by the second programmable compute unit executing the second thread, and wherein the buffer comprises a first-in-first-out (FIFO) buffer;
determining, directly with the management unit, a location within the buffer where the data produced by the execution of the first thread is to be stored; and
storing, with the IC, the data produced by the execution of the first thread in the determined location within the buffer.
1 Assignment
0 Petitions
Accused Products
Abstract
The techniques are generally related to management of buffers with a management unit that resides within an integrated circuit that includes a graphics processing unit (GPU). The management unit may ensure proper access to the buffers by the programmable compute units of the GPU to allow the GPU to execute kernels on the programmable compute units in a pipeline fashion.
-
Citations
23 Claims
-
1. A method for execution of data processing operations in a pipeline fashion, the method comprising:
-
executing a first thread on a first programmable compute unit of a shader processor of a graphics processing unit (GPU), wherein the shader processor includes a plurality of programmable compute units including the first programmable compute unit; executing a second thread on a second programmable compute unit of the plurality of programmable compute units of the shader processor of the GPU; receiving, directly with a management unit within an integrated circuit (IC) that includes the GPU, a request from the first programmable compute unit to store data produced by the execution of the first thread into a buffer in an integrated global memory external to the IC shared by the plurality of programmable compute units, wherein the data produced by the execution of the first thread is to be consumed by the second programmable compute unit executing the second thread, and wherein the buffer comprises a first-in-first-out (FIFO) buffer; determining, directly with the management unit, a location within the buffer where the data produced by the execution of the first thread is to be stored; and storing, with the IC, the data produced by the execution of the first thread in the determined location within the buffer. - View Dependent Claims (2, 3, 4, 5, 6, 7, 8, 9, 10)
-
-
11. An apparatus comprising:
-
an integrated global memory shared by a plurality of programmable compute units that includes a buffer, wherein the buffer comprises a first-in-first-out (FIFO) buffer; an integrated circuit (IC) comprising; a graphics processing unit (GPU), the GPU comprising; the plurality of programmable compute units; a first programmable compute unit of the plurality of programmable compute units configured to execute a first thread; and a second programmable compute unit of the plurality of programmable compute units configured to execute a second thread; and a management unit configured to; directly receive a request from the first programmable compute unit to store data produced by the execution of the first thread into the buffer in the global memory, wherein the data produced by the execution of the first thread is to be consumed by the second programmable compute unit executing the second thread; and directly determine a location within the buffer where the data produced by the execution of the first thread is to be stored, wherein the IC is configured to store the data produced by the execution of the first thread in the determined location within the buffer. - View Dependent Claims (12, 13, 14, 15, 16, 17, 18, 19, 20, 21)
-
-
22. An apparatus comprising:
-
an integrated global memory shared by a plurality of programmable compute units that includes a buffer, wherein the buffer comprises a first-in-first-out (FIFO) buffer; and an integrated circuit (IC) comprising; a graphics processing unit (GPU) comprising; means for executing a first thread; and means for executing a second thread; and means for directly receiving a request from the means for executing the first thread to store data produced by the execution of the first thread into the buffer in the global memory, wherein the data produced by the execution of the first thread is to be consumed by the means for executing the second thread; means for directly determining a location within the buffer where the data produced by the means for executing the first thread is to be stored; and means for storing the data produced by the execution of the first thread in the determined location within the buffer.
-
-
23. A non-transitory computer-readable storage medium having instructions stored thereon that when executed cause one or more processors to:
-
execute a first thread on a first programmable compute unit of a shader processor of a graphics processing unit (GPU) , wherein the shader processor includes a plurality of programmable compute units including the first programmable compute unit; execute a second thread on a second programmable compute unit of the plurality of programmable compute units of the shader processor of the GPU; receive, directly with a management unit within an integrated circuit (IC) that includes the GPU, a request from the first programmable compute unit to store data produced by the execution of the first thread into a buffer in an integrated global memory external to the IC shared by the plurality of programmable compute units, wherein the data produced by the execution of the first thread is to be consumed by the second programmable compute unit executing the second thread, and wherein the buffer comprises a first-in-first-out (FIFO) buffer; determine, directly with the management unit, a location within the buffer where the data produced by the execution of the first thread is to be stored; and store, with the IC, the data produced by the execution of the first thread in the determined location within the buffer.
-
Specification