Mechanism for efficient implementation of software pipelined loops in VLIW processors

US 8,447,961 B2
Filed: 02/18/2010
Issued: 05/21/2013
Est. Priority Date: 02/18/2009
Status: Active Grant

First Claim

Patent Images

1. A system to implement a zero overhead software pipelined (SFP) loop, said system comprising:

a Very Long Instruction Word (VLIW) processor having a N number of execution slots, said VLIW processor executes a plurality of instructions in parallel without any limitation of an instruction buffer size;

a program memory that receives a Program Memory address to fetch an instruction packet, wherein said program memory is closely coupled with an instruction buffer and a dispatcher to implement said zero overhead SFP loop, wherein said zero overhead SFP loop is at least one of a short zero overhead SFP loop and a long zero overhead SFP loop, and wherein a size of said zero overhead SFP loop exceeds said instruction buffer size;

at least one CPU control register comprising a block count and a iteration count, wherein said block count is loaded into a block counter and a last instruction address of said zero overhead SFP loop is computed to check whether said block count is greater than a maximum short block size, wherein when said block count is greater than said maximum short block size, a long SFP loop is executed, wherein when said block count is less than said maximum short block size, a short SFP loop is executed, and wherein said iteration count is loaded into an iteration counter and counts a number of iterations of said zero overhead SFP loop based on said block counter;

a loop instruction fetch logic that tracks at least one instruction of said plurality of instructions; and

a control logic that generates at least one of a control signals received by a instruction buffer, wherein said control signals are generated to execute said zero overhead SFP loop.

View all claims

1 Assignment

Timeline View

Assignment View

0 Petitions

Accused Products

Abstract

A system to implement a zero overhead software pipelined (SFP) loop includes a Very Long Instruction Word (VLIW) processor having an N number of execution slots. The VLIW processor executes a plurality of instructions in parallel without any limitation of an instruction buffer size. A program memory receives a Program Memory address to fetch an instruction packet. The program memory is closely coupled with the instruction buffer size to implement the zero overhead software pipelined (SFP) loop. The size of the zero overhead software pipelined (SFP) loop can exceed the instruction buffer size. A CPU control register includes a block count and an iteration count. The block count is loaded into a block counter and counts the plurality of instructions executed in the SFP loop, and the iteration count is loaded into an iteration counter and counts a number of iterations of the SFP loop based on the block count.

Citations

20 Claims

1. A system to implement a zero overhead software pipelined (SFP) loop, said system comprising:
- a Very Long Instruction Word (VLIW) processor having a N number of execution slots, said VLIW processor executes a plurality of instructions in parallel without any limitation of an instruction buffer size;
  
  a program memory that receives a Program Memory address to fetch an instruction packet, wherein said program memory is closely coupled with an instruction buffer and a dispatcher to implement said zero overhead SFP loop, wherein said zero overhead SFP loop is at least one of a short zero overhead SFP loop and a long zero overhead SFP loop, and wherein a size of said zero overhead SFP loop exceeds said instruction buffer size;
  
  at least one CPU control register comprising a block count and a iteration count, wherein said block count is loaded into a block counter and a last instruction address of said zero overhead SFP loop is computed to check whether said block count is greater than a maximum short block size, wherein when said block count is greater than said maximum short block size, a long SFP loop is executed, wherein when said block count is less than said maximum short block size, a short SFP loop is executed, and wherein said iteration count is loaded into an iteration counter and counts a number of iterations of said zero overhead SFP loop based on said block counter;
  
  a loop instruction fetch logic that tracks at least one instruction of said plurality of instructions; and
  
  a control logic that generates at least one of a control signals received by a instruction buffer, wherein said control signals are generated to execute said zero overhead SFP loop.
- View Dependent Claims (2, 3, 4, 5)
- - 2. The system of claim 1, wherein said iteration counter is initially loaded and decremented by one when said block count reaches zero.
  - 3. The system of claim 1, wherein said block counter is initially loaded with the block count and decremented by one when said at least one instruction is dispatched by said dispatcher.
  - 4. The system of claim 1, wherein said zero overhead SFP loop reloads a fetch program address with a start address of said zero overhead SFP loop and continues until a last iteration when said number of instructions associated with said zero overhead SFP loop is being fetched.
  - 5. The system of claim 1, wherein execution of said long SFP loop takes place in parallel with a generation of a next program memory address.

6. A method of implementing a short software pipelined (SFP) loop in a system, said system comprising:
- a processor having a N number of execution slots, wherein said processor executes a plurality of instructions in parallel without any limitation of an instruction buffer size;
  
  a program memory that receives a program memory address to fetch an instruction packet;
  
  at least one CPU control register comprising a block count and an iteration count, wherein said block count is loaded into a block counter and counts said plurality of instructions executed in said short SFP loop, and said iteration count is loaded into a iteration counter and counts a number of iterations of said short SFP loop based on said block counter, said method comprising;
  
  determining when an instruction of said short SFP loop is encountered at execution packet boundaries;
  
  storing a start address on said instruction being encountered;
  
  storing an iteration count in said iteration counter and storing said block count in said block counter;
  
  computing a last instruction address; and
  
  determining when said block count is greater than a maximum short block size, wherein said maximum short block size is equal to a minimum depth of instruction buffer minus a size of one fetch packet, and wherein said short SFP loop is executed when said block count is less than said maximum short block size.
- View Dependent Claims (7, 8, 9, 10, 11, 12, 13)
- - 7. The method of claim 6, further comprising:
    - generating a next program memory (PMEM) address;
      
      determining when said PMEM address is equal to a last instruction address;
      
      determining that an execution of said short SFP loop is finished when said PMEM address is equal to said last instruction; and
      
      generating a next PMEM address when said execution of said short SFP loop is finished.
  - 8. The method of claim 7, wherein no read request is sent to said PMEM address when said execution is not finished.
  - 9. The method of claim 6, further comprising:
    - starting, in parallel with said generating said next PMEM address, an execution of said short SFP loop;
      
      loading said iteration count into said iteration counter and saving a read pointer;
      
      loading said block count into said block counter and decrementing said iteration count;
      
      dispatching an execute packet;
      
      decrementing said block count;
      
      determining when said block count is equal to zero; and
      
      determining when said iteration count is equal to zero when said block count is equal to zero.
  - 10. The method of claim 9, wherein an execute packet of said short SFP loop is dispatched when said block count is not equal to zero.
  - 11. The method of claim 9, wherein an execution is exited from a loop execute instruction outside said short SFP loop when said iteration count is equal to zero.
  - 12. The method of claim 9, wherein a read pointer is reassigned with a saved read pointer when said iteration count is not equal to zero.
  - 13. The method of claim 12, wherein said block count is loaded into said block counter and said iteration count is decremented when said read pointer is reassigned to said saved read pointer.

14. A method of implementing a long software pipelined (SFP) loop in a system, said system comprising:
- a processor having a N number of execution slots, wherein said processor executes a plurality of instructions in parallel without any limitation of an instruction buffer size;
  
  a program memory that receives a program memory address to fetch an instruction packet, wherein said program memory is closely coupled with an instruction buffer and a dispatcher to implement said long SFP loop, and wherein a size of said long SFP loop exceeds said instruction buffer size;
  
  at least one CPU control register comprising a block count and an iteration count, wherein said block count is loaded into a block counter and counts said plurality of instructions executed in said long SFP loop, and said iteration count is loaded into the iteration counter and counts a number of iterations of said long SFP loop based on said block counter, said method comprising;
  
  determining when an instruction of said long SFP loop is encountered at execution packet boundaries;
  
  storing a start address when said instruction is encountered;
  
  storing an iteration count and a block count;
  
  computing a last instruction address; and
  
  determining when said block count is greater than a maximum short block size, wherein said long SFP loop is executed when said block count is greater than said maximum short block size.
- View Dependent Claims (15, 16, 17, 18, 19, 20)
- - 15. The method of claim 14, further comprising:
    - generating a next program memory (PMEM) address;
      
      determining when said PMEM address is equal to a last instruction address;
      
      determining that an execution is finished when said PMEM address is equal to said last instruction address; and
      
      generating a next PMEM address when said execution is finished.
  - 16. The method of claim 15, wherein an execution is exited from a loop execute instruction outside said long SFP loop when said block count and said iteration count values are equal to zero.
  - 17. The method of claim 15, wherein a start address is sent to said PMEM address when said execution of said long SFP loop is not finished.
  - 18. The method of claim 14, further comprising:
    - starting, in parallel with said generating said next PMEM address, an execution of said long SFP loop;
      
      loading said iteration count into said iteration counter;
      
      loading said block count into said block counter and decrementing said iteration count;
      
      dispatching an execute packet;
      
      decrementing said block count;
      
      determining when said block count is equal to zero; and
      
      determining when said iteration count is equal to zero when said block count is equal to zero.
  - 19. The method of claim 18, wherein an execute packet is dispatched when said block count is not equal to zero.
  - 20. The method of claim 18, wherein said block count is loaded into a block counter and said iteration count is decremented when said iteration count is not equal to zero.

Specification

Resources

Litigation Campaign Assessment

Current Assignee
Saankhya Labs Pvt Ltd.
Original Assignee
Saankhya Labs Pvt Ltd.
Inventors
Saha, Anindya, Kumar, Manish, Mallapur, Hemant, Billava, Santhosh, Rajangam, Viji
Primary Examiner(s)
LI, AIMEE J

Application Number

US12/708,288
Publication Number

US 20100211762A1
Time in Patent Office

1,188 Days
Field of Search

712/24, 712/241
US Class Current

712/241
CPC Class Codes

G06F 9/30065   Loop control instructions; ...

G06F 9/325   for loops, e.g. loop detect...

G06F 9/381   Loop buffering

G06F 9/3853   of compound instructions

Mechanism for efficient implementation of software pipelined loops in VLIW processors

First Claim

1 Assignment

0 Petitions

Accused Products

Abstract

Citations

20 Claims

Specification

Solutions

Use Cases

Quick Links

Mechanism for efficient implementation of software pipelined loops in VLIW processors

First Claim

1 Assignment

Subscription Required

Subscription Required

0 Petitions

Subscription Required

Accused Products

Subscription Required

Abstract

Citations

20 Claims

Specification

Subscription Required

Solutions

Use Cases

Quick Links