Dynamic load balancing of instructions for execution by heterogeneous processing engines
First Claim
1. A computer-implemented method for dynamically load balancing instruction execution in a single-instruction multiple-data (SIMD) architecture with heterogeneous processing engines, comprising:
- computing, prior to assigning instructions included in a set of unassigned instructions, a first initial weighted instruction count associated with a first processing engine of the heterogeneous processing engines and a second initial weighted instruction count associated with a second processing engine of the heterogeneous processing engines, based on a set of expected latencies associated with the set of unassigned instructions, wherein a dual-issue instruction that is included in the set of unassigned instructions and that is configured to specify the first processing engine as a target is assigned a weighted value of zero in both the first initial weighted instruction count and the second initial weighted instruction count, wherein only the first processing engine is configured to execute instructions of a first single-issue type included in the set of unassigned instructions, wherein only the second processing engine is configured to execute instructions of a second single-issue type that is different than the instructions of the first single-issue type included in the set of unassigned instructions, and wherein the dual-issue instruction is executable in parallel for multiple threads in a SIMD thread group by the first processing engine and the second processing engine;
assigning the instructions in the set of unassigned instructions to the first processing engine or the second processing engine based on the first initial weighted instruction count and the second initial weighted instruction count;
computing a first weighted instruction count for the instructions assigned to the first processing engine that is proportional to an execution latency corresponding to instructions that are assigned to the first processing engine;
computing a second weighted instruction count for the instructions assigned to the second processing engine that is proportional to an execution latency corresponding to instructions that are assigned to the second processing engine;
determining that the first weighted instruction count associated with the first processing engine is greater than the second weighted instruction count associated with the second processing engine;
overriding the target specified by the dual-issue instruction;
assigning the dual-issue instruction for execution by the second processing engine based on the first weighted instruction count being greater than the second weighted instruction count;
receiving a control instruction;
extracting a target address from the control instruction; and
reading and executing one or more instructions starting at the target address.
1 Assignment
0 Petitions
Accused Products
Abstract
An embodiment of a computing system is configured to process data using a multithreaded SIMD architecture that includes heterogeneous processing engines to execute a program. The program is constructed of various program instructions. A first type of the program instructions can only be executed by a first type of processing engine and a third type of program instructions can only be executed by a second type of processing engine. A second type of program instructions can be executed by the first and the second type of processing engines. An assignment unit may be configured to dynamically determine which of the two processing engines executes any program instructions of the second type in order to balance the workload between the heterogeneous processing engines.
52 Citations
15 Claims
-
1. A computer-implemented method for dynamically load balancing instruction execution in a single-instruction multiple-data (SIMD) architecture with heterogeneous processing engines, comprising:
-
computing, prior to assigning instructions included in a set of unassigned instructions, a first initial weighted instruction count associated with a first processing engine of the heterogeneous processing engines and a second initial weighted instruction count associated with a second processing engine of the heterogeneous processing engines, based on a set of expected latencies associated with the set of unassigned instructions, wherein a dual-issue instruction that is included in the set of unassigned instructions and that is configured to specify the first processing engine as a target is assigned a weighted value of zero in both the first initial weighted instruction count and the second initial weighted instruction count, wherein only the first processing engine is configured to execute instructions of a first single-issue type included in the set of unassigned instructions, wherein only the second processing engine is configured to execute instructions of a second single-issue type that is different than the instructions of the first single-issue type included in the set of unassigned instructions, and wherein the dual-issue instruction is executable in parallel for multiple threads in a SIMD thread group by the first processing engine and the second processing engine; assigning the instructions in the set of unassigned instructions to the first processing engine or the second processing engine based on the first initial weighted instruction count and the second initial weighted instruction count; computing a first weighted instruction count for the instructions assigned to the first processing engine that is proportional to an execution latency corresponding to instructions that are assigned to the first processing engine; computing a second weighted instruction count for the instructions assigned to the second processing engine that is proportional to an execution latency corresponding to instructions that are assigned to the second processing engine; determining that the first weighted instruction count associated with the first processing engine is greater than the second weighted instruction count associated with the second processing engine; overriding the target specified by the dual-issue instruction; assigning the dual-issue instruction for execution by the second processing engine based on the first weighted instruction count being greater than the second weighted instruction count; receiving a control instruction; extracting a target address from the control instruction; and reading and executing one or more instructions starting at the target address. - View Dependent Claims (2, 3, 4, 5)
-
-
6. A non-transitory computer-readable medium storing instructions for causing a SIMD architecture processor that includes heterogeneous processing engines to dynamically load balance instruction execution by performing the steps of:
-
computing, prior to assigning instructions included in a set of unassigned instructions, a first initial weighted instruction count associated with a first processing engine of the heterogeneous processing engines and a second initial weighted instruction count associated with a second processing engine of the heterogeneous processing engines, based on a set of expected latencies associated with the set of unassigned instructions, wherein a dual-issue instruction that is included in the set of unassigned instructions and that is configured to specify the first processing engine as a target is assigned a weighted value of zero in both the first initial weighted instruction count and the second initial weighted instruction count, wherein only the first processing engine is configured to execute instructions of a first single-issue type included in the set of unassigned instructions, wherein only the second processing engine is configured to execute instructions of a second single-issue type that is different than the instructions of the first single-issue type included in the set of unassigned instructions, and wherein the dual-issue instruction is executable in parallel for multiple threads in a SIMD thread group by the first processing engine and the second processing engine; assigning the instructions in the set of unassigned instructions to the first processing engine or the second processing engine based on the first initial weighted instruction count and the second initial weighted instruction count; computing a first weighted instruction count for the instructions assigned to the first processing engine that is proportional to an execution latency corresponding to instructions that are assigned to the first processing engine; computing a second weighted instruction count for the instructions assigned to the second processing engine that is proportional to an execution latency corresponding to instructions that are assigned to the second processing engine; determining that the first weighted instruction count associated with the first processing engine is greater than the second weighted instruction count associated with the second processing engine; overriding the target specified by the dual-issue instruction; assigning the dual issue program instruction for execution by the second processing engine based on the first weighted instruction count being greater than the second weighted instruction count receiving a control instruction; extracting a target address from the control instruction; and reading and executing one or more instructions starting at the target address. - View Dependent Claims (7, 8, 9, 10)
-
-
11. A system for dynamically load balancing instruction execution in a single-instruction multiple-data (SIMD) architecture with heterogeneous processing engines, comprising:
-
a first processing engine of the heterogeneous processing engines that is configured to execute dual issue instructions and instructions of a first type in parallel for multiple threads in a SIMD thread group, wherein only the first processing engine is configured to execute instructions of the first type; a second processing engine of the heterogeneous processing engines that is configured to execute dual issue instructions and instructions of a second type that is different than the first type in parallel for the multiple threads in the SIMD thread group, wherein only the second processing engine is configured to execute instructions of the second type; a work distribution unit coupled to the first processing engine and the second processing engine and configured to; compute, prior to assigning instructions included in a set of unassigned instructions, a first initial weighted instruction count associated with the first processing engine and a second initial weighted instruction count associated with the second processing engine, based on a set of expected latencies associated with the set of unassigned instructions, wherein a dual-issue instruction that is included in the set of unassigned instructions and that is configured to specify the first processing engine as a target is assigned a weighted value of zero in both the first initial weighted instruction count and the second initial weighted instruction count, wherein only the first processing engine is configured to execute instructions of a first single-issue type included in the set of unassigned instructions, wherein only the second processing engine is configured to execute instructions of a second single-issue type that is different than the instructions of the first single-issue type included in the set of unassigned instructions, and wherein the dual-issue instruction is executable in parallel for multiple threads in a SIMD thread group by the first processing engine and the second processing engine; assign the instructions in the set of unassigned instructions to the first processing engine or the second processing engine based on the first initial weighted instruction count and the second initial weighted instruction count; compute a first weighted instruction count for the instructions assigned to the first processing engine that is proportional to an execution latency corresponding to instructions that are assigned to the first processing engine; compute a second weighted instruction count for the instructions assigned to the second processing engine that is proportional to an execution latency corresponding to instructions that are assigned to the second processing engine; determine that a first weighted instruction count associated with the first processing engine is greater than a second weighted instruction count associated with the second processing engine, override the target specified by the dual-issue instruction, and assign the dual issue instruction for execution by the second processing engine based on the first weighted instruction count being greater than the second weighted instruction count; and an instruction unit included in the first processing engine that is configured to; receive a control instruction, extract a target address from the control instruction, and read and cause one or more instructions in the set of unassigned instructions to be executed at the target address. - View Dependent Claims (12, 13, 14, 15)
-
Specification