Mechanism for efficient implementation of software pipelined loops in VLIW processors
First Claim
1. A system to implement a zero overhead software pipelined (SFP) loop, said system comprising:
- a Very Long Instruction Word (VLIW) processor having a N number of execution slots, said VLIW processor executes a plurality of instructions in parallel without any limitation of an instruction buffer size;
a program memory that receives a Program Memory address to fetch an instruction packet, wherein said program memory is closely coupled with an instruction buffer and a dispatcher to implement said zero overhead SFP loop, wherein said zero overhead SFP loop is at least one of a short zero overhead SFP loop and a long zero overhead SFP loop, and wherein a size of said zero overhead SFP loop exceeds said instruction buffer size;
at least one CPU control register comprising a block count and a iteration count, wherein said block count is loaded into a block counter and a last instruction address of said zero overhead SFP loop is computed to check whether said block count is greater than a maximum short block size, wherein when said block count is greater than said maximum short block size, a long SFP loop is executed, wherein when said block count is less than said maximum short block size, a short SFP loop is executed, and wherein said iteration count is loaded into an iteration counter and counts a number of iterations of said zero overhead SFP loop based on said block counter;
a loop instruction fetch logic that tracks at least one instruction of said plurality of instructions; and
a control logic that generates at least one of a control signals received by a instruction buffer, wherein said control signals are generated to execute said zero overhead SFP loop.
1 Assignment
0 Petitions
Accused Products
Abstract
A system to implement a zero overhead software pipelined (SFP) loop includes a Very Long Instruction Word (VLIW) processor having an N number of execution slots. The VLIW processor executes a plurality of instructions in parallel without any limitation of an instruction buffer size. A program memory receives a Program Memory address to fetch an instruction packet. The program memory is closely coupled with the instruction buffer size to implement the zero overhead software pipelined (SFP) loop. The size of the zero overhead software pipelined (SFP) loop can exceed the instruction buffer size. A CPU control register includes a block count and an iteration count. The block count is loaded into a block counter and counts the plurality of instructions executed in the SFP loop, and the iteration count is loaded into an iteration counter and counts a number of iterations of the SFP loop based on the block count.
-
Citations
20 Claims
-
1. A system to implement a zero overhead software pipelined (SFP) loop, said system comprising:
-
a Very Long Instruction Word (VLIW) processor having a N number of execution slots, said VLIW processor executes a plurality of instructions in parallel without any limitation of an instruction buffer size; a program memory that receives a Program Memory address to fetch an instruction packet, wherein said program memory is closely coupled with an instruction buffer and a dispatcher to implement said zero overhead SFP loop, wherein said zero overhead SFP loop is at least one of a short zero overhead SFP loop and a long zero overhead SFP loop, and wherein a size of said zero overhead SFP loop exceeds said instruction buffer size; at least one CPU control register comprising a block count and a iteration count, wherein said block count is loaded into a block counter and a last instruction address of said zero overhead SFP loop is computed to check whether said block count is greater than a maximum short block size, wherein when said block count is greater than said maximum short block size, a long SFP loop is executed, wherein when said block count is less than said maximum short block size, a short SFP loop is executed, and wherein said iteration count is loaded into an iteration counter and counts a number of iterations of said zero overhead SFP loop based on said block counter; a loop instruction fetch logic that tracks at least one instruction of said plurality of instructions; and a control logic that generates at least one of a control signals received by a instruction buffer, wherein said control signals are generated to execute said zero overhead SFP loop. - View Dependent Claims (2, 3, 4, 5)
-
-
6. A method of implementing a short software pipelined (SFP) loop in a system, said system comprising:
-
a processor having a N number of execution slots, wherein said processor executes a plurality of instructions in parallel without any limitation of an instruction buffer size; a program memory that receives a program memory address to fetch an instruction packet; at least one CPU control register comprising a block count and an iteration count, wherein said block count is loaded into a block counter and counts said plurality of instructions executed in said short SFP loop, and said iteration count is loaded into a iteration counter and counts a number of iterations of said short SFP loop based on said block counter, said method comprising; determining when an instruction of said short SFP loop is encountered at execution packet boundaries; storing a start address on said instruction being encountered; storing an iteration count in said iteration counter and storing said block count in said block counter; computing a last instruction address; and determining when said block count is greater than a maximum short block size, wherein said maximum short block size is equal to a minimum depth of instruction buffer minus a size of one fetch packet, and wherein said short SFP loop is executed when said block count is less than said maximum short block size. - View Dependent Claims (7, 8, 9, 10, 11, 12, 13)
-
-
14. A method of implementing a long software pipelined (SFP) loop in a system, said system comprising:
-
a processor having a N number of execution slots, wherein said processor executes a plurality of instructions in parallel without any limitation of an instruction buffer size; a program memory that receives a program memory address to fetch an instruction packet, wherein said program memory is closely coupled with an instruction buffer and a dispatcher to implement said long SFP loop, and wherein a size of said long SFP loop exceeds said instruction buffer size; at least one CPU control register comprising a block count and an iteration count, wherein said block count is loaded into a block counter and counts said plurality of instructions executed in said long SFP loop, and said iteration count is loaded into the iteration counter and counts a number of iterations of said long SFP loop based on said block counter, said method comprising; determining when an instruction of said long SFP loop is encountered at execution packet boundaries; storing a start address when said instruction is encountered; storing an iteration count and a block count; computing a last instruction address; and determining when said block count is greater than a maximum short block size, wherein said long SFP loop is executed when said block count is greater than said maximum short block size. - View Dependent Claims (15, 16, 17, 18, 19, 20)
-
Specification