Instruction culling in graphics processing unit

US 9,195,501 B2
Filed: 07/12/2011
Issued: 11/24/2015
Est. Priority Date: 07/12/2011
Status: Active Grant

First Claim

Patent Images

1. A method of processing data with a graphics processing unit (GPU), the method comprising:

executing, with one or more shader processors of the GPU, a first work item of a first kernel of an application that includes the first kernel and one or more consecutively executed second kernels, wherein the first work item includes one or more instructions for processing input data;

generating, in addition to a result of the first work item, a plurality of cull values based on the result of the first work item of the first kernel, wherein the plurality of cull values indicate whether to execute work items of the one or more second kernels on the input data; and

when the plurality of cull values indicate that the work items of the one or more second kernels are not to be executed, determining not to execute the work items of the one or more second kernels and removing the work items of the one or more second kernels from the instruction stream prior to scheduling the work items to be executed by the one or more shader processors.

View all claims

1 Assignment

Timeline View

Assignment View

0 Petitions

Accused Products

Abstract

Aspects of the disclosure are directed to a method of processing data with a graphics processing unit (GPU). According to some aspects, the method includes executing a first work item with a shader processor of the GPU, wherein the first work item includes one or more instructions for processing input data. The method also includes generating one or more values based on a result of the first work item, wherein the one or more values represent one or more characteristics of the result. The method also includes determining whether to execute a second work item based on the one or more values, wherein the second work item includes one or more instructions that are distinct from the one or more instructions of the first work item for processing the input data.

30 Citations

25 Claims

1. A method of processing data with a graphics processing unit (GPU), the method comprising:
- executing, with one or more shader processors of the GPU, a first work item of a first kernel of an application that includes the first kernel and one or more consecutively executed second kernels, wherein the first work item includes one or more instructions for processing input data;
  
  generating, in addition to a result of the first work item, a plurality of cull values based on the result of the first work item of the first kernel, wherein the plurality of cull values indicate whether to execute work items of the one or more second kernels on the input data; and
  
  when the plurality of cull values indicate that the work items of the one or more second kernels are not to be executed, determining not to execute the work items of the one or more second kernels and removing the work items of the one or more second kernels from the instruction stream prior to scheduling the work items to be executed by the one or more shader processors.
- View Dependent Claims (2, 3, 4, 5, 6)
- - 2. The method of claim 1, further comprising storing the plurality of cull values in a buffer, and wherein determining whether to execute the work items of the one or more second kernels comprises reading the plurality of cull values stored in the buffer.
  - 3. The method of claim 2, further comprising:
    - executing a second work item of a second kernel of the one or more second kernels with the shader processor of the GPU on the input data;
      
      updating the plurality of cull values based on a result of the second work item, wherein the updated plurality of cull values indicate whether to execute subsequent work items of the one or more second kernels on the input data;
      
      and determining whether to execute the subsequent work items based on the updated plurality of cull values.
  - 4. The method of claim 1, further comprising:
    - executing a first workgroup with the shader processor of the GPU, wherein the first workgroup is associated with the first kernel and wherein the first workgroup comprises a plurality of instructions including the first work item for processing the input data;
      
      generating one or more workgroup cull values based on results of the first workgroup, wherein the one or more workgroup cull values indicate whether to execute workgroups of the one or more second kernels; and
      
      determining whether to execute the workgroups of the one or more second kernels based on the one or more workgroup cull values.
  - 5. The method of claim 1, wherein the one or more second kernels comprises a plurality of kernels, and wherein generating the plurality of cull values comprises generating one or more cull values that indicate whether to execute all of the work items of the plurality of kernels on the input data.
  - 6. The method of claim 1, wherein each respective cull value of the plurality represents a respective characteristic of a result of the first work item.

7. An apparatus for processing data with a graphics processing unit (GPU), the apparatus comprising:
- one or more shader processors configured to;
  
  execute a first work item of the first kernel of an application that includes the first kernel and one or more consecutively executed second kernels that includes one or more instructions for processing input data, andgenerate, in addition to a result of the first work item, a plurality of cull values based on the result of the first work item of the first kernel, wherein the plurality of cull values indicate whether to execute work items of the one or more second kernels on the input; and
  
  a cull module configured to, when the plurality of cull values indicate that the work items of the one or more second kernels are not to be executed, determine not to execute the work items of the one or more second kernels and remove the work items of the one or more second kernels from the instruction stream prior to scheduling the work items to be executed by the one or more shader processors.
- View Dependent Claims (8, 9, 10, 11, 12, 13)
- - 8. The apparatus of claim 7, further comprising a cull buffer configured to store the plurality of cull values, and wherein the cull module is configured to determine whether to execute the work items of the one or more second kernels by reading the plurality of cull values stored in the cull buffer.
  - 9. The apparatus of claim 8, wherein the one or more shader processors are further configured to:
    - execute a second work item of a second kernel of the one or more second kernels,update the plurality of cull values based on a result of the second work item, wherein the updated plurality of cull values indicate whether to execute subsequent work items of the one or more second kernels on the input data determine whether to execute the subsequent work items based on the updated plurality of cull values.
  - 10. The apparatus of claim 7, wherein the one or more shader processors are configured to:
    - execute a first workgroup that is associated with the first kernel, wherein the first workgroup comprises a plurality of instructions including the first work item for processing the input data,generate one or workgroup cull values based on results of the first workgroup, wherein the one or more workgroup cull values indicate whether to execute workgroups of the one or more second kernels, anddetermine whether to execute the workgroups of the one or more second kernels based on the one or more workgroup cull values.
  - 11. The apparatus of claim 7, wherein the one or more shader processors and cull module are included in portable computing device.
  - 12. The apparatus of claim 7, wherein the one or more second kernels comprises a plurality of kernels, and wherein to generate the plurality of cull values, one or more shader processors are configured to generate one or more cull values that indicate whether to execute all of the work items of the plurality of kernels on the input data.
  - 13. The apparatus of claim 7, wherein each respective cull value of the plurality represents a respective characteristic of a result of the first work item.

14. A non-transitory computer-readable storage medium encoded with instructions for causing one or more processors of a computing device to:
- execute, with one or more shader processors of a GPU of the computing device, a first work item of a first kernel of an application that includes the first kernel and one or more consecutively executed second kernels, wherein the first work item includes one or more instructions for processing input data;
  
  generate, in addition to a result of the first work item, a plurality of cull values based on the result of the first work item of the first kernel, wherein the plurality of cull values indicate whether to execute work items of the one or more second kernels on the input data; and
  
  when the plurality of cull values indicate that the work items of the one or more second kernels are not to be executed, determine not to execute the work items of the one or more second kernels and remove the work items of the one or more second kernels from the instruction stream prior to scheduling the work items to be executed by the one or more shader processors.
- View Dependent Claims (15, 16, 17, 18, 19)
- - 15. The non-transitory computer-readable storage medium of claim 14, further comprising instructions for causing the one or more processors of the computing device to store the plurality of cull values to a buffer, and wherein to determine whether to execute the work items of the one or more second kernels the instructions cause the one or more processors to read the plurality of cull values stored in the cull buffer.
  - 16. The non-transitory computer-readable storage medium of claim 15, further comprising instructions for causing the one or more processors of the computing device to:
    - execute a second work item of a second kernel of the one or more second kernels with the shader processor of the GPU on the input data;
      
      update the plurality of cull values based on a result of the second work item, wherein the updated plurality of cull values indicate whether to execute subsequent work items of the one or more second kernels on the input data;
      
      and determine whether to execute the subsequent work items based on the updated plurality of values.
  - 17. The non-transitory computer-readable storage medium of claim 14, further comprising instructions for causing the one or more processors of the computing device to:
    - execute a first workgroup with the shader processor of the GPU, wherein the first workgroup is associated with the first kernel and wherein the first workgroup comprises a plurality of instructions including the first work item for processing the input data;
      
      generate one or more workgroup cull values based on results of the first workgroup, wherein the one or more workgroup cull values indicate whether to execute workgroups of the one or more second kernels; and
      
      determine whether to execute the workgroups of the one or more second kernels based on the one or more workgroup cull values.
  - 18. The non-transitory computer-readable storage medium of claim 14, wherein the one or more second kernels comprises a plurality of kernels, and wherein to generate the plurality of cull values, the instructions cause the one or more processors to generate one or more cull values that indicate whether to execute all of the work items of the plurality of kernels on the input data.
  - 19. The non-transitory computer-readable storage medium of claim 14, wherein each respective cull value of the plurality represents a respective characteristic of a result of the first work item.

20. An apparatus for processing data with a graphics processing unit (GPU), the apparatus comprising:
- a means for executing, with one or more shader processors of the GPU, a first work item of a first kernel of an application that includes the first kernel and one or more consecutively executed second kernels, wherein the first work item includes one or more instructions for processing input data;
  
  a means for generating, in addition to a result of the first work item, a plurality of cull values based on the result of the first work item of the first kernel, wherein the plurality of cull values indicate whether to execute work items of the one or more second kernels on the input data; and
  
  a means for determining, when the plurality of cull values indicate that the work items of the one or more second kernels are not to be executed, not to execute the work items of the one or more second kernels and removing the work items of the one or more second kernels from the instruction stream prior to scheduling the work items to be executed by the one or more shader processors.
- View Dependent Claims (21, 22, 23, 24, 25)
- - 21. The apparatus of claim 20, further comprising a means for storing the plurality of cull values in a buffer, and wherein the means for determining whether to execute the work items of the one or more second kernels comprises means for reading the plurality of cull values stored in the buffer.
  - 22. The apparatus of claim 21, further comprising:
    - a means for executing a second work item of a second kernel of the one or more second kernels with the shader processor of the GPU on the input data;
      
      a means for updating the plurality of cull values based on a result of the second work item, wherein the updated plurality of cull values indicate whether to execute subsequent work items of the one or more second kernels on the input data;
      
      and a means for determining whether to execute the subsequent work items based on the updated plurality of cull values.
  - 23. The apparatus of claim 20, further comprising:
    - a means for executing a first workgroup with the shader processor of the GPU, wherein the first workgroup is associated with the first kernel and wherein the first workgroup comprises a plurality of instructions including the first work item for processing the input data;
      
      a means for generating one or more workgroup cull values based on results of the first workgroup, wherein the one or more workgroup cull values indicate whether to execute workgroups of the one or more second kernels; and
      
      a means for determining whether to execute the workgroups of the one or more second kernels based on the one or more workgroup cull values.
  - 24. The apparatus of claim 20, wherein the one or more second kernels comprises a plurality of kernels, and wherein the means for generating the plurality of cull values comprises means for generating one or more cull values that indicate whether to execute all of the work items of the plurality of kernels on the input data.
  - 25. The apparatus of claim 20, wherein each respective cull value of the plurality represents a respective characteristic of a result of the first work item.

Specification

Resources

Litigation Campaign Assessment

Current Assignee
Qualcomm, Inc.
Original Assignee
Qualcomm, Inc.
Inventors
Arvo, Jukka-Pekka
Primary Examiner(s)
Wu, Xiao
Assistant Examiner(s)
Cobb, Michael J

Application Number

US13/181,233
Publication Number

US 20130016110A1
Time in Patent Office

1,596 Days
Field of Search

345/501, 345/522, 712220-245, 712 E9016- E9073, 712 E9082- E9083, 718/102
US Class Current

1/1
CPC Class Codes

G06F 9/4881 Scheduling strategies for d...

Instruction culling in graphics processing unit

First Claim

1 Assignment

0 Petitions

Accused Products

Abstract

30 Citations

25 Claims

Specification

Solutions

Use Cases

Quick Links

Instruction culling in graphics processing unit

First Claim

1 Assignment

Subscription Required

Subscription Required

0 Petitions

Subscription Required

Accused Products

Subscription Required

Abstract

30 Citations

25 Claims

Specification

Subscription Required

Solutions

Use Cases

Quick Links