Method and apparatus for distributing processing core workloads among processing cores
First Claim
1. A method for workload distribution between a first processing core and a second processing core comprising:
- providing queue elements from one or more workgroup queues associated with workgroups executing on the first processing core to a first donation queue associated with the workgroups executing on the first processing core atomically and within a device scope of the first processing core and the second processing core; and
when a queue level of the first donation queue is below a first threshold, stealing one or more queue elements from a second donation queue associated with workgroups executing on the second processing core to the first donation queue atomically and within the device scope of the first processing core and the second processing core.
1 Assignment
0 Petitions
Accused Products
Abstract
Briefly, methods and apparatus to rebalance workloads among processing cores utilizing a hybrid work donation and work stealing technique are disclosed that improve workload imbalances within processing devices such as, for example, GPUs. In one example, the methods and apparatus allow for workload distribution between a first processing core and a second processing core by providing queue elements from one or more workgroup queues associated with workgroups executing on the first processing core to a first donation queue that may also be associated with the workgroups executing on the first processing core. The method and apparatus also determine if a queue level of the first donation queue is beyond a threshold, and if so, steal one or more queue elements from a second donation queue associated with workgroups executing on the second processing core.
-
Citations
17 Claims
-
1. A method for workload distribution between a first processing core and a second processing core comprising:
-
providing queue elements from one or more workgroup queues associated with workgroups executing on the first processing core to a first donation queue associated with the workgroups executing on the first processing core atomically and within a device scope of the first processing core and the second processing core; and when a queue level of the first donation queue is below a first threshold, stealing one or more queue elements from a second donation queue associated with workgroups executing on the second processing core to the first donation queue atomically and within the device scope of the first processing core and the second processing core. - View Dependent Claims (2, 3, 4, 5, 6)
-
-
7. An electronic device comprising:
-
a processor that is operative to; provide queue elements from one or more workgroup queues associated with workgroups executing on a first processing core to a first donation queue associated with the workgroups executing on the first processing core atomically and within a device scope of the first processing core and the second processing core; and when a queue level of the first donation queue is below a first threshold, steal one or more queue elements from a second donation queue associated with workgroups executing on the second processing core to the first donation queue atomically and within the device scope of the first processing core and the second processing core. - View Dependent Claims (8, 9, 10, 11, 12, 13, 14)
-
-
15. A non-transitory computer readable medium comprising executable instructions that when executed by one or more processors causes the one or more processors to:
-
provide queue elements from one or more workgroup queues associated with workgroups executing on a first processing core to a first donation queue associated with the workgroups executing on the first processing core atomically and within a device scope of the first processing core and the second processing core; and when a queue level of the first donation queue is below a first threshold, steal one or more queue elements from a second donation queue associated with workgroups executing on the second processing core to the first donation queue atomically and within the device scope of the first processing core and the second processing core. - View Dependent Claims (16, 17)
-
Specification