Apparatus and method for memory-hierarchy aware producer-consumer instruction
First Claim
1. A method for transferring a chunk of data from a core of a central processing unit (CPU) to a graphics processing unit (GPU), comprising:
- executing a first instruction, the first instruction being a single instruction, wherein the first instruction comprises a MovNonAllocate store instruction, the executing comprising;
responsive to the first instruction,writing data, without caching the data, to a buffer within the core of the CPU until a designated amount of data has been written, wherein the buffer combines multiple stores until the designated amount of data has been written, andupon detecting that the designated amount of data has been written, responsively generating an eviction cycle, the eviction cycle causing the data to be transferred from the buffer to a cache shared by both the core and the GPU, wherein the cache is a level 3 cache;
setting an indication to indicate to the GPU that data is available in the cache; and
upon the GPU detecting the indication, providing the data to the GPU from the cache upon receipt of a read signal from the GPU.
1 Assignment
0 Petitions
Accused Products
Abstract
An apparatus and method are described for efficiently transferring data from a core of a central processing unit (CPU) to a graphics processing unit (GPU). For example, one embodiment of a method comprises: writing data to a buffer within the core of the CPU until a designated amount of data has been written; upon detecting that the designated amount of data has been written, responsively generating an eviction cycle, the eviction cycle causing the data to be transferred from the buffer to a cache accessible by both the core and the GPU; setting an indication to indicate to the GPU that data is available in the cache; and upon the GPU detecting the indication, providing the data to the GPU from the cache upon receipt of a read signal from the GPU.
37 Citations
21 Claims
-
1. A method for transferring a chunk of data from a core of a central processing unit (CPU) to a graphics processing unit (GPU), comprising:
-
executing a first instruction, the first instruction being a single instruction, wherein the first instruction comprises a MovNonAllocate store instruction, the executing comprising; responsive to the first instruction, writing data, without caching the data, to a buffer within the core of the CPU until a designated amount of data has been written, wherein the buffer combines multiple stores until the designated amount of data has been written, and upon detecting that the designated amount of data has been written, responsively generating an eviction cycle, the eviction cycle causing the data to be transferred from the buffer to a cache shared by both the core and the GPU, wherein the cache is a level 3 cache; setting an indication to indicate to the GPU that data is available in the cache; and upon the GPU detecting the indication, providing the data to the GPU from the cache upon receipt of a read signal from the GPU. - View Dependent Claims (2, 3, 4, 5, 6, 7, 8, 9, 10)
-
-
11. An instruction processing apparatus comprising:
-
at least one core of a central processing unit (CPU) and a cache shared by both the core and a graphics processing unit (GPU); and the core comprising CPU-GPU producer-consumer logic configured to perform the operations of; executing a first instruction, the first instruction being a single instruction, wherein the first instruction comprises a MovNonAllocate store instruction, the executing comprising; writing data, without caching the data, to a buffer within the core of the CPU until a designated amount of data has been written, wherein the buffer combines multiple stores until the designated amount of data has been written, and upon detecting that the designated amount of data has been written, responsively generating an eviction cycle, the eviction cycle causing the data to be transferred from the buffer to the cache shared by both the core and the GPU, wherein the cache is a level 3 cache; and setting an indication to indicate to the GPU that data is available in the cache; wherein upon the GPU detecting the indication, the data is provided to the GPU from the cache upon receipt of a read signal from the GPU. - View Dependent Claims (12, 13, 14, 15, 16, 17, 18, 19, 20)
-
-
21. A computer system comprising:
-
a graphics processor unit (GPU) for processing a set of graphics instructions to render video; and a central processing unit (CPU) comprising; at least one core and a cache shared by both the core and the GPU; and the core comprising CPU-GPU producer-consumer logic configured to perform the operations of; executing a first instruction, the first instruction being a single instruction, wherein the first instruction comprises a MovNonAllocate store instruction, the executing comprising; writing data to a buffer within the core of the CPU until a designated amount of data has been written, wherein the buffer combines multiple stores until the designated amount of data has been written, and upon detecting that the designated amount of data has been written, responsively generating an eviction cycle, the eviction cycle causing the data to be transferred from the buffer to the cache shared by both the core and the GPU, wherein the cache is a level 3 cache; and setting an indication to indicate to the GPU that data is available in the cache; wherein upon the GPU detecting the indication, the data is provided to the GPU from the cache upon receipt of a read signal from the GPU.
-
Specification