Systems and Methods for Improving Throughput of a Graphics Processing Unit
First Claim
Patent Images
1. A graphics processing unit, comprising:
- an execution unit configured to execute programmable shader operations, wherein the execution unit is further configured to simultaneously process operations for a plurality of threads;
a first memory forming a register file configured to accommodate register operations for all threads executed by the execution unit, the memory being organized in a plurality of banks, with a first plurality of banks being allocated to a first plurality of the threads and a second plurality of banks being allocated to the remaining threads;
a second memory forming a constant cache configured to accommodate the fetching of constants for a plurality of shader operations executed within the execution unit, the constant cache configured to store a plurality of contexts of values corresponding to a plurality of types of shaders, the constant cache further configured to store a plurality of contexts and a plurality of versions of constant values in each context stored within the constant cache; and
a third memory forming a vertex attribute cache configured to accommodate the storing of vertex attributes processed by programmable shader operations executed by the execution unit.
1 Assignment
0 Petitions
Accused Products
Abstract
Systems and methods for improving throughput of a graphics processing unit are disclosed. In one embodiment, a system includes a multithreaded execution unit capable of processing requests to access a constant cache, a vertex attribute cache, at least one common register file, and an execution unit data path substantially simultaneously.
-
Citations
22 Claims
-
1. A graphics processing unit, comprising:
-
an execution unit configured to execute programmable shader operations, wherein the execution unit is further configured to simultaneously process operations for a plurality of threads; a first memory forming a register file configured to accommodate register operations for all threads executed by the execution unit, the memory being organized in a plurality of banks, with a first plurality of banks being allocated to a first plurality of the threads and a second plurality of banks being allocated to the remaining threads; a second memory forming a constant cache configured to accommodate the fetching of constants for a plurality of shader operations executed within the execution unit, the constant cache configured to store a plurality of contexts of values corresponding to a plurality of types of shaders, the constant cache further configured to store a plurality of contexts and a plurality of versions of constant values in each context stored within the constant cache; and a third memory forming a vertex attribute cache configured to accommodate the storing of vertex attributes processed by programmable shader operations executed by the execution unit. - View Dependent Claims (2, 3, 4, 5, 6, 7, 8)
-
-
9. A graphics processing unit comprising:
-
an execution unit capable of multi-threaded operation, the execution unit having a thread controller, the thread controller including a first instruction fetch arbiter and a second instruction fetch arbiter;
whereinthe first instruction fetch arbiter is configured to fetch instructions on behalf of at least half of a plurality of threads within the execution units; and the second instruction fetch arbiter is configured to fetch instructions on behalf of the remainder of the plurality of threads. - View Dependent Claims (10, 11, 12, 13, 14, 15, 16)
-
-
17. A method of processing instructions in a graphics processing unit, comprising the steps of:
-
fetching a first instruction in an execution unit from an instruction cache on behalf of one of a plurality of active threads, broadcasting the first instruction to the plurality of active threads, queueing the first instruction in an instruction queue corresponding to at least one of the plurality of active threads, decoding a second instruction in the instruction queue of at least one of the plurality of active threads, submitting the second instruction data requests to at least one of;
a constant cache, a vertex attribute cache, a first common register file, a second common register file, and an execution unit data path. - View Dependent Claims (18, 19, 20, 21, 22)
-
Specification