Dynamic load balancing of instructions for execution by heterogeneous processing engines

US 8,578,387 B1
Filed: 07/31/2007
Issued: 11/05/2013
Est. Priority Date: 07/31/2007
Status: Active Grant

First Claim

Patent Images

1. A computer-implemented method for dynamically load balancing instruction execution in a single-instruction multiple-data (SIMD) architecture with heterogeneous processing engines, comprising:

computing, prior to assigning instructions included in a set of unassigned instructions, a first initial weighted instruction count associated with a first processing engine of the heterogeneous processing engines and a second initial weighted instruction count associated with a second processing engine of the heterogeneous processing engines, based on a set of expected latencies associated with the set of unassigned instructions, wherein a dual-issue instruction that is included in the set of unassigned instructions and that is configured to specify the first processing engine as a target is assigned a weighted value of zero in both the first initial weighted instruction count and the second initial weighted instruction count, wherein only the first processing engine is configured to execute instructions of a first single-issue type included in the set of unassigned instructions, wherein only the second processing engine is configured to execute instructions of a second single-issue type that is different than the instructions of the first single-issue type included in the set of unassigned instructions, and wherein the dual-issue instruction is executable in parallel for multiple threads in a SIMD thread group by the first processing engine and the second processing engine;

assigning the instructions in the set of unassigned instructions to the first processing engine or the second processing engine based on the first initial weighted instruction count and the second initial weighted instruction count;

computing a first weighted instruction count for the instructions assigned to the first processing engine that is proportional to an execution latency corresponding to instructions that are assigned to the first processing engine;

computing a second weighted instruction count for the instructions assigned to the second processing engine that is proportional to an execution latency corresponding to instructions that are assigned to the second processing engine;

determining that the first weighted instruction count associated with the first processing engine is greater than the second weighted instruction count associated with the second processing engine;

overriding the target specified by the dual-issue instruction;

assigning the dual-issue instruction for execution by the second processing engine based on the first weighted instruction count being greater than the second weighted instruction count;

receiving a control instruction;

extracting a target address from the control instruction; and

reading and executing one or more instructions starting at the target address.

View all claims

1 Assignment

Timeline View

Assignment View

0 Petitions

Accused Products

Abstract

An embodiment of a computing system is configured to process data using a multithreaded SIMD architecture that includes heterogeneous processing engines to execute a program. The program is constructed of various program instructions. A first type of the program instructions can only be executed by a first type of processing engine and a third type of program instructions can only be executed by a second type of processing engine. A second type of program instructions can be executed by the first and the second type of processing engines. An assignment unit may be configured to dynamically determine which of the two processing engines executes any program instructions of the second type in order to balance the workload between the heterogeneous processing engines.

52 Citations

View as Search Results

15 Claims

1. A computer-implemented method for dynamically load balancing instruction execution in a single-instruction multiple-data (SIMD) architecture with heterogeneous processing engines, comprising:
- computing, prior to assigning instructions included in a set of unassigned instructions, a first initial weighted instruction count associated with a first processing engine of the heterogeneous processing engines and a second initial weighted instruction count associated with a second processing engine of the heterogeneous processing engines, based on a set of expected latencies associated with the set of unassigned instructions, wherein a dual-issue instruction that is included in the set of unassigned instructions and that is configured to specify the first processing engine as a target is assigned a weighted value of zero in both the first initial weighted instruction count and the second initial weighted instruction count, wherein only the first processing engine is configured to execute instructions of a first single-issue type included in the set of unassigned instructions, wherein only the second processing engine is configured to execute instructions of a second single-issue type that is different than the instructions of the first single-issue type included in the set of unassigned instructions, and wherein the dual-issue instruction is executable in parallel for multiple threads in a SIMD thread group by the first processing engine and the second processing engine;
  
  assigning the instructions in the set of unassigned instructions to the first processing engine or the second processing engine based on the first initial weighted instruction count and the second initial weighted instruction count;
  
  computing a first weighted instruction count for the instructions assigned to the first processing engine that is proportional to an execution latency corresponding to instructions that are assigned to the first processing engine;
  
  computing a second weighted instruction count for the instructions assigned to the second processing engine that is proportional to an execution latency corresponding to instructions that are assigned to the second processing engine;
  
  determining that the first weighted instruction count associated with the first processing engine is greater than the second weighted instruction count associated with the second processing engine;
  
  overriding the target specified by the dual-issue instruction;
  
  assigning the dual-issue instruction for execution by the second processing engine based on the first weighted instruction count being greater than the second weighted instruction count;
  
  receiving a control instruction;
  
  extracting a target address from the control instruction; and
  
  reading and executing one or more instructions starting at the target address.
- View Dependent Claims (2, 3, 4, 5)
- - 2. The computer-implemented method of claim 1, further comprising:
    - determining that a second instruction is assigned to the first processing engine for execution; and
      
      computing a weighted value for the second instruction that is proportional to an execution latency that is incurred when the second instruction is executed by the first processing engine.
  - 3. The computer-implemented method of claim 2, further comprising updating the first weighted instruction count associated with the first processing engine by adding the weighted value for the second instruction to the first weighted instruction count.
  - 4. The computer-implemented method of claim 3, further comprising updating the first weighted instruction count associated with the first processing engine by subtracting the weighted value for the second instruction from the first weighted instruction count when the second instruction is dispatched for execution.
  - 5. The computer-implemented method of claim 1, wherein a first weighted value computed for execution of the dual issue instruction by the first processing engine does not equal a second weighted value computed for execution of the dual issue program instruction by the second processing engine.

6. A non-transitory computer-readable medium storing instructions for causing a SIMD architecture processor that includes heterogeneous processing engines to dynamically load balance instruction execution by performing the steps of:
- computing, prior to assigning instructions included in a set of unassigned instructions, a first initial weighted instruction count associated with a first processing engine of the heterogeneous processing engines and a second initial weighted instruction count associated with a second processing engine of the heterogeneous processing engines, based on a set of expected latencies associated with the set of unassigned instructions, wherein a dual-issue instruction that is included in the set of unassigned instructions and that is configured to specify the first processing engine as a target is assigned a weighted value of zero in both the first initial weighted instruction count and the second initial weighted instruction count, wherein only the first processing engine is configured to execute instructions of a first single-issue type included in the set of unassigned instructions, wherein only the second processing engine is configured to execute instructions of a second single-issue type that is different than the instructions of the first single-issue type included in the set of unassigned instructions, and wherein the dual-issue instruction is executable in parallel for multiple threads in a SIMD thread group by the first processing engine and the second processing engine;
  
  assigning the instructions in the set of unassigned instructions to the first processing engine or the second processing engine based on the first initial weighted instruction count and the second initial weighted instruction count;
  
  computing a first weighted instruction count for the instructions assigned to the first processing engine that is proportional to an execution latency corresponding to instructions that are assigned to the first processing engine;
  
  computing a second weighted instruction count for the instructions assigned to the second processing engine that is proportional to an execution latency corresponding to instructions that are assigned to the second processing engine;
  
  determining that the first weighted instruction count associated with the first processing engine is greater than the second weighted instruction count associated with the second processing engine;
  
  overriding the target specified by the dual-issue instruction;
  
  assigning the dual issue program instruction for execution by the second processing engine based on the first weighted instruction count being greater than the second weighted instruction countreceiving a control instruction;
  
  extracting a target address from the control instruction; and
  
  reading and executing one or more instructions starting at the target address.
- View Dependent Claims (7, 8, 9, 10)
- - 7. The non-transitory computer-readable medium of claim 6, further comprising the steps of:
    - determining that a second instruction is assigned to the first processing engine for execution; and
      
      computing a weighted value for the second instruction that is proportional to an execution latency that is incurred when the second instruction is executed by the first processing engine.
  - 8. The non-transitory computer-readable medium of claim 7, further comprising the step of updating the first weighted instruction count associated with the first processing engine by adding the weighted value for the second instruction to the first weighted instruction count.
  - 9. The non-transitory computer-readable medium of claim 8, further comprising the step of updating the first weighted instruction count associated with the first processing engine by subtracting the weighted value for the second instruction from the first weighted instruction count when the second instruction is dispatched for execution.
  - 10. The non-transitory computer-readable medium of claim 6, wherein a first weighted value computed for execution of the dual issue instruction by the first processing engine does not equal a second weighted value computed for execution of the dual issue instruction by the second processing engine.

11. A system for dynamically load balancing instruction execution in a single-instruction multiple-data (SIMD) architecture with heterogeneous processing engines, comprising:
- a first processing engine of the heterogeneous processing engines that is configured to execute dual issue instructions and instructions of a first type in parallel for multiple threads in a SIMD thread group, wherein only the first processing engine is configured to execute instructions of the first type;
  
  a second processing engine of the heterogeneous processing engines that is configured to execute dual issue instructions and instructions of a second type that is different than the first type in parallel for the multiple threads in the SIMD thread group, wherein only the second processing engine is configured to execute instructions of the second type;
  
  a work distribution unit coupled to the first processing engine and the second processing engine and configured to;
  
  compute, prior to assigning instructions included in a set of unassigned instructions, a first initial weighted instruction count associated with the first processing engine and a second initial weighted instruction count associated with the second processing engine, based on a set of expected latencies associated with the set of unassigned instructions, wherein a dual-issue instruction that is included in the set of unassigned instructions and that is configured to specify the first processing engine as a target is assigned a weighted value of zero in both the first initial weighted instruction count and the second initial weighted instruction count, wherein only the first processing engine is configured to execute instructions of a first single-issue type included in the set of unassigned instructions, wherein only the second processing engine is configured to execute instructions of a second single-issue type that is different than the instructions of the first single-issue type included in the set of unassigned instructions, and wherein the dual-issue instruction is executable in parallel for multiple threads in a SIMD thread group by the first processing engine and the second processing engine;
  
  assign the instructions in the set of unassigned instructions to the first processing engine or the second processing engine based on the first initial weighted instruction count and the second initial weighted instruction count;
  
  compute a first weighted instruction count for the instructions assigned to the first processing engine that is proportional to an execution latency corresponding to instructions that are assigned to the first processing engine;
  
  compute a second weighted instruction count for the instructions assigned to the second processing engine that is proportional to an execution latency corresponding to instructions that are assigned to the second processing engine;
  
  determine that a first weighted instruction count associated with the first processing engine is greater than a second weighted instruction count associated with the second processing engine,override the target specified by the dual-issue instruction, andassign the dual issue instruction for execution by the second processing engine based on the first weighted instruction count being greater than the second weighted instruction count; and
  
  an instruction unit included in the first processing engine that is configured to;
  
  receive a control instruction,extract a target address from the control instruction, andread and cause one or more instructions in the set of unassigned instructions to be executed at the target address.
- View Dependent Claims (12, 13, 14, 15)
- - 12. The system of claim 11, wherein the work distribution unit is further configured to:
    - determine that a second instruction is assigned to the first processing engine for execution;
      
      compute a weighted value for the second instruction that is proportional to an execution latency that is incurred when the second instruction is executed by the first processing engine.
  - 13. The system of claim 12, wherein the work distribution unit is further configured to update the first weighted instruction count associated with the first processing engine by adding the weighted value for the second instruction to the first weighted instruction count.
  - 14. The system of claim 13, wherein the work distribution unit is further configured to update the first weighted instruction count associated with the first processing engine by subtracting the weighted value for the second instruction from the first weighted instruction count when the second instruction is dispatched for execution.
  - 15. The system of claim 11, wherein a first weighted value computed for execution of the dual issue program instruction by the first processing engine does not equal a second weighted value computed for execution of the dual issue instruction by the second processing engine.

Specification

Resources

Litigation Campaign Assessment

Current Assignee
NVIDIA Corporation
Original Assignee
NVIDIA Corporation
Inventors
Mills, Peter C., Oberman, Stuart F., Lindholm, John Erik, Liu, Samuel
Primary Examiner(s)
Puente, Emerson
Assistant Examiner(s)
PATEL, HIREN P

Application Number

US11/831,873
Time in Patent Office

2,289 Days
Field of Search

None
US Class Current

718/105
CPC Class Codes

G06F 2209/507   Low-level

G06F 9/3836   Instruction issuing, e.g. d...

G06F 9/3851   from multiple instruction s...

G06F 9/3887   controlled by a single inst...

G06F 9/5044   considering hardware capabi...

Y02D 10/00   Energy efficient computing,...

Dynamic load balancing of instructions for execution by heterogeneous processing engines

First Claim

1 Assignment

0 Petitions

Accused Products

Abstract

52 Citations

15 Claims

Specification

Solutions

Use Cases

Quick Links

Dynamic load balancing of instructions for execution by heterogeneous processing engines

First Claim

1 Assignment

Subscription Required

Subscription Required

0 Petitions

Subscription Required

Accused Products

Subscription Required

Abstract

52 Citations

15 Claims

Specification

Subscription Required

Solutions

Use Cases

Quick Links