Unified virtual addressed register file
First Claim
1. A multi-threaded processor comprising:
- a thread scheduler configured to receive a thread, allocate one or more virtual registers mapped to one or more internal addresses in a unified memory space, and store content of thread registers associated with the thread in the one or more mapped internal addresses in the unified memory space;
a unified register file coupled to the thread scheduler, the unified register file including the unified memory space that stores the thread registers; and
a processing unit coupled to the unified register file and configured to retrieve content of thread registers from the internal addresses in the unified memory space based on the virtual register mapping.
1 Assignment
0 Petitions
Accused Products
Abstract
A multi-threaded processor is provided, such as a shader processor, having an internal unified memory space that is shared by a plurality of threads and is dynamically assigned to threads as needed. A mapping table that maps virtual registers to available internal addresses in the unified memory space so that thread registers can be stored in contiguous or non-contiguous memory addresses. Dynamic sizing of the virtual registers allows flexible allocation of the unified memory space depending on the type and size of data in a thread register. Yet another feature provides an efficient method for storing graphics data in the unified memory space to improve fetch and store operations from the memory space. In particular, pixel data for four pixels in a thread are stored across four memory devices having independent input/output ports that permit the four pixels to be read in a single clock cycle for processing.
134 Citations
32 Claims
-
1. A multi-threaded processor comprising:
-
a thread scheduler configured to receive a thread, allocate one or more virtual registers mapped to one or more internal addresses in a unified memory space, and store content of thread registers associated with the thread in the one or more mapped internal addresses in the unified memory space; a unified register file coupled to the thread scheduler, the unified register file including the unified memory space that stores the thread registers; and a processing unit coupled to the unified register file and configured to retrieve content of thread registers from the internal addresses in the unified memory space based on the virtual register mapping.
-
-
2. The multi-threaded processor of 1 wherein the processing unit is further configured to
process data in the thread to obtain a result, and store the result to one or more other virtual registers mapped to one or more other internal addresses in the unified memory space.
-
3. The multi-threaded processor of 1 wherein the thread scheduler determines the size of the received thread registers and dynamically maps the one or more internal addresses in the unified memory space based on the size of the received thread registers.
-
4. The multi-threaded processor of 1 wherein the thread scheduler is configured to de-allocate the one or more virtual registers once the processing unit has processed the thread.
-
5. The multi-threaded processor of 1 wherein the mapped internal addresses in the unified memory space are non-contiguous.
-
6. The multi-threaded processor of 1 wherein the one or more virtual registers have contiguous addresses.
-
7. The multi-threaded processor of 1 wherein the thread scheduler maintains a mapping table of allocated virtual registers mapped to internal addresses in the unified memory space.
-
8. The multi-threaded processor of 1 wherein the thread includes pixel data for a plurality of pixels.
-
9. The multi-threaded processor of 8 wherein the unified register file is divided into a plurality of memory banks where the pixel data is stored across two or more of the plurality of memory banks.
-
10. The multi-threaded processor of 9 wherein the memory banks include a plurality of simultaneous read and write ports that permit stored content for thread registers to be read from the memory banks while content for new thread registers is stored to the same memory banks.
-
11. The multi-threaded processor of 9 wherein the processing unit is configured to retrieve pixel data for two or more pixels stored across two or more memory banks in a single clock cycle.
-
12. A multi-threaded processor comprising:
-
means for receiving a thread having associated thread registers; means for mapping one or more virtual registers to one or more internal addresses in a unified memory space; means for allocating one or more virtual registers to the thread registers; and means for storing content of the thread registers in the one or more internal addresses associated with the allocated virtual registers.
-
-
13. The multi-threaded processor of 12 further comprising:
means for retrieving content for the thread registers from the one or more internal addresses based on the virtual register mapping.
-
14. The multi-threaded processor of 12 further comprising:
-
means for processing content of the thread registers stored in the one or more internal addresses to obtain a result; and means for storing the result to one or more other virtual registers mapped to one or more other internal addresses in the unified memory space.
-
-
15. The multi-threaded processor of 12 further comprising:
-
means for determining the size of the received thread registers; and means for dynamically mapping to the one or more internal addresses based on the size of the received thread registers.
-
-
16. The multi-threaded processor of 12 wherein the thread includes pixel data for a plurality of pixels.
-
17. The multi-threaded processor of 12 further comprising:
means for de-allocating the one or more virtual registers once the processing unit has processed the thread.
-
18. A method comprising:
-
receiving a thread at a multi-threaded processor, the thread including associated thread registers; mapping one or more virtual registers to one or more internal addresses in a unified memory space of the multi-threaded processor; allocating the one or more virtual registers to the received thread; and storing content of thread registers in the one or more internal addresses associated with the allocated virtual registers.
-
-
19. The method of 18 wherein the one or more internal addresses are allocated to correspond to the size of the received thread registers.
-
20. The method of 18 further comprising
retrieving content for the thread registers stored in the one or more internal addresses; - and
de-allocating the one or more internal addresses once the thread has been processed.
- and
-
21. The method of 18 further comprising:
re-allocating the one or more virtual registers to a new thread.
-
22. The method of 18 further wherein the thread includes pixel data for a plurality of pixels.
-
23. The method of 18 wherein the pixel data is stored across two or more memory storage devices.
-
24. The method of 23 further comprising:
retrieving the pixel data stored in the two or more memory storage devices while simultaneously storing data to the two or more memory storage devices.
-
25. A machine-readable medium having one or more instructions for processing multiple threads in a multi-threaded processor, which when executed by a processor causes the processor to:
-
map one or more virtual registers to one or more internal registers in a unified memory space of the multi-threaded processor; and allocate the one or more virtual registers to a received thread.
-
-
26. The machine-readable medium of 25 further having one or more instructions which when executed by a processor causes the processor to:
-
store one or more registers of the received thread in the one or more internal registers associated with the allocated virtual registers; retrieve content of the thread registers stored in the one or more internal addresses; and de-allocate the one or more internal addresses once the thread has been processed.
-
-
27. The machine-readable medium of 25 further having one or more instructions which when executed by a processor causes the processor to:
re-allocate the one or more of the internal addresses to a new thread.
-
28. A graphics processor comprising:
-
a cache memory for receiving external instructions; a texture engine for storing graphics texture data; a multi-threaded processor coupled to the cache memory and texture engine, the multi-threaded processor configured to receive a thread having associated thread registers; determine the size of received thread registers; dynamically map one or more virtual registers to one or more internal addresses in a unified memory space of the multi-threaded processor; and allocate the one or more virtual registers to the received thread registers.
-
-
29. The graphics processor of 28 wherein the multi-threaded processor is further configured to
store content of the thread registers in the one or more internal addresses associated with the allocated virtual registers.
-
30. The graphics processor of 28 wherein the multi-threaded processor is further configured to
obtain one or more instructions associated with the received thread from the cache memory; - and
obtain texture data associated with the received thread from the texture engine.
- and
-
31. A wireless communication device comprising:
-
a communication interface to wirelessly communicate with other devices; a graphics processor coupled to provide graphics data to the display unit, the graphics processor configured to receive a thread including thread registers; dynamically map one or more virtual registers to one or more internal addresses in a unified memory space of the graphics processor; and allocate the one or more virtual registers to the received thread registers.
-
-
32. The wireless communication device of 31 wherein the graphics processor is further configured to
store content of the thread registers in the one or more internal addresses associated with the allocated virtual registers.
Specification