Mechanism for Efficient Implementation of Software Pipelined Loops in VLIW Processors

US 20100211762A1
Filed: 02/18/2010
Published: 08/19/2010
Est. Priority Date: 02/18/2009
Status: Active Grant

First Claim

Patent Images

1. A system to implement a zero overhead software pipelined (SFP) loop, said system comprising:

a Very Long Instruction Word (VLIW) processor having a N number of execution slots, said VLIW processor executes a plurality of instructions in parallel without any limitation of an instruction buffer size;

a program memory that receives a Program Memory address to fetch an instruction packet, wherein said program memory is closely coupled with said instruction buffer size to implement said zero overhead software pipelined (SFP) loop, wherein the size of said zero overhead software pipelined (SFP) loop to exceed said instruction buffer size;

a CPU control registers comprising a block count and a iteration count, wherein said block count is loaded into a block counter and counts said plurality of instructions executed in the said zero overhead software pipelined (SFP) loop, and said iteration count is loaded into an iteration counter and counts a number of iterations of said zero overhead software pipelined (SFP) loop based on said block counter;

a loop instruction fetch logic that tracks at least one of a instructions of said plurality of instructions; and

a control logic that generates at least one of a control signals received by a instruction buffer, wherein said control signals are generated to execute said zero overhead software pipelined (SFP) loop.

View all claims

1 Assignment

Timeline View

Assignment View

0 Petitions

Accused Products

Abstract

A system to implement a zero overhead software pipelined (SFP) loop includes a Very Long Instruction Word (VLIW) processor having an N number of execution slots. The VLIW processor executes a plurality of instructions in parallel without any limitation of an instruction buffer size. A program memory receives a Program Memory address to fetch an instruction packet. The program memory is closely coupled with the instruction buffer size to implement the zero overhead software pipelined (SFP) loop. The size of the zero overhead software pipelined (SFP) loop can exceed the instruction buffer size. A CPU control register includes a block count and an iteration count. The block count is loaded into a block counter and counts the plurality of instructions executed in the SFP loop, and the iteration count is loaded into an iteration counter and counts a number of iterations of the SFP loop based on the block count.

Citations

20 Claims

1. A system to implement a zero overhead software pipelined (SFP) loop, said system comprising:
- a Very Long Instruction Word (VLIW) processor having a N number of execution slots, said VLIW processor executes a plurality of instructions in parallel without any limitation of an instruction buffer size;
  
  a program memory that receives a Program Memory address to fetch an instruction packet, wherein said program memory is closely coupled with said instruction buffer size to implement said zero overhead software pipelined (SFP) loop, wherein the size of said zero overhead software pipelined (SFP) loop to exceed said instruction buffer size;
  
  a CPU control registers comprising a block count and a iteration count, wherein said block count is loaded into a block counter and counts said plurality of instructions executed in the said zero overhead software pipelined (SFP) loop, and said iteration count is loaded into an iteration counter and counts a number of iterations of said zero overhead software pipelined (SFP) loop based on said block counter;
  
  a loop instruction fetch logic that tracks at least one of a instructions of said plurality of instructions; and
  
  a control logic that generates at least one of a control signals received by a instruction buffer, wherein said control signals are generated to execute said zero overhead software pipelined (SFP) loop.
- View Dependent Claims (2, 3, 4, 5)
- - 2. The system of claim 1, wherein said iteration counter is initially loaded and decremented by one when the block count reaches zero.
  - 3. The system of claim 1, and said block counter is initially loaded with the block count and decremented by one when one of an instructions being dispatched.
  - 4. The system of claim 1, wherein said zero overhead software pipelined (SFP) loop reloads the fetch program address with a start address of said zero overhead software pipelined (SFP) loop and continues till the last iteration when said number of instructions associated with said zero overhead software pipelined (SFP) loop being fetched.
  - 5. The system of claim 1, wherein said zero overhead software pipelined (SFP) loop is at least one of a short zero overhead software pipelined (SFP) loop and a long zero overhead software pipelined (SFP) loop.

6. A method of implementing a short software pipelined (SFP) loop in a system, said system comprising:
- a processor having a N number of execution slots, said processor executes a plurality of instructions in parallel without any limitation of an instruction buffer size;
  
  a program memory that receives a program memory address to fetch an instruction packet;
  
  a CPU control registers comprising a block count and a iteration count, wherein said block count is loaded into a block counter and counts said plurality of instructions executed in said short (SFP) loop, and said iteration count is loaded into a iteration counter and counts a number of iterations of said short SFP loop based on said block counter, said method comprising;
  
  determining if an instruction of said short SFP loop is encountered at the execution packet boundaries;
  
  storing a start address on said instruction being encountered;
  
  storing an iteration count in said iteration counter and said block count in said block counter;
  
  computing a last instruction address; and
  
  determining if said block count is greater than a maximum short block size, wherein said maximum short block size is equal to minimum depth of instruction buffer minus size of one fetch packet, wherein said short SFP loop is executed when said block count being lesser than said maximum short block size.
- View Dependent Claims (7, 8, 9, 10, 11, 12, 13)
- - 7. The method of claim 6, further comprising:
    - generating a next program memory (PMEM) address;
      
      determining if said PMEM address is equal to a last instruction address;
      
      determining an execution of said short SFP loop is finished if said PMEM address is equal to said last instruction; and
      
      generating a next PMEM address if said execution of said short SFP loop is finished.
  - 8. The method of claim 6, further comprising:
    - starting, in parallel with said generating said next PMEM address, an execution of said short SFP loop;
      
      loading said iteration count into said iteration counter and saving a read pointer;
      
      loading said block count into said block counter and decrementing said iteration count;
      
      dispatching an execute packet;
      
      decrementing said block count;
      
      determining if said block count is equal to zero; and
      
      determining if said iteration count is equal to zero if said block count is equal to zero.
  - 9. The method of claim 8, wherein an execute packet of said short loop is dispatched on said block count not being equal to zero.
  - 10. The method of claim 8, wherein an execution is exited from loop execute instruction outside said short SFP loop if said iteration count is equal to zero.
  - 11. The method of claim 7, wherein no read request is sent to Program Memory (PMEM) if said execution is not finished.
  - 12. The method of claim 8, wherein a read pointer reassigned with saved read pointer if said iteration count is not equal to zero.
  - 13. The method of claim 12, wherein a block count is loaded into a block counter and an iteration count is decremented on said read pointer being reassigned to saved read pointer.

14. A method of implementing a long SFP loop in a system, said system comprising:
- a processor having a N number of execution slots, said processor executes a plurality of instructions in parallel without any limitation of an instruction buffer size;
  
  a program memory that receives a program memory address to fetch an instruction packet, wherein said program memory is closely coupled with said instruction buffer size to implement said long SFP loop, wherein the size of said long SFP loop to exceed said instruction buffer size;
  
  a CPU control registers comprising a block count and a iteration count, wherein said block count is loaded into a block counter and counts said plurality of instructions executed in the a SFP loop, and said iteration count is loaded into the iteration counter and counts a number of iterations of said SFP loop based on said block counter, said method comprising;
  
  determining if an instruction of said long SFP loop is encountered at the execution packet boundaries;
  
  storing a start address on said instruction being encountered;
  
  storing an iteration count and an block count;
  
  computing a last instruction address; and
  
  determining if said block count is greater than a maximum short block size, wherein said long SFP loop is executed when said block count being greater than said maximum short block size.
- View Dependent Claims (15, 16, 17, 18, 19, 20)
- - 15. The method of claim 14, further comprising:
    - generating a next program memory (PMEM) address;
      
      determining if said PMEM address is equal to a last instruction address;
      
      determining an execution is finished if said PMEM address is equal to said last instruction; and
      
      generating a next PMEM address if said execution is finished.
  - 16. The method of claim 14, further comprising:
    - starting, in parallel with said generating said next PMEM address, an execution of said long SFP loop;
      
      loading said iteration count into said iteration counter;
      
      loading said block count into said block counter and decrementing said iteration count;
      
      dispatching an execute packet;
      
      decrementing said block count;
      
      determining if said block count is equal to zero; and
      
      determining if said iteration count is equal to zero if said block count is equal to zero.
  - 17. The method of claim 15, wherein an execution is exited from a loop execute instruction outside said long SFP loop if said block count and said iteration count values are equal to zero.
  - 18. The method of claim 15, wherein a start address is sent to said program memory (PMEM) if said execution of said long SFP loop is not finished.
  - 19. The method of claim 16, wherein an execute packet is dispatched if said block count is not equal to zero.
  - 20. The method of claim 16, wherein said block count is loaded into a block counter and decrementing an iteration count if said iteration count is not equal to zero.

Specification

Resources

Litigation Campaign Assessment

Current Assignee
Saankhya Labs Pvt Ltd.
Original Assignee
Saankhya Labs Pvt Ltd.
Inventors
Kumar, Manish, Saha, Anindya, Billava, Santhosh, Mallapur, Hemant, Rajangam, Viji

Granted Patent

US 8,447,961 B2
Time in Patent Office

Days
Field of Search
US Class Current

712/205
CPC Class Codes

G06F 9/30065   Loop control instructions; ...

G06F 9/325   for loops, e.g. loop detect...

G06F 9/381   Loop buffering

G06F 9/3853   of compound instructions

Mechanism for Efficient Implementation of Software Pipelined Loops in VLIW Processors

First Claim

1 Assignment

0 Petitions

Accused Products

Abstract

Citations

20 Claims

Specification

Solutions

Use Cases

Quick Links

Mechanism for Efficient Implementation of Software Pipelined Loops in VLIW Processors

First Claim

1 Assignment

Subscription Required

Subscription Required

0 Petitions

Subscription Required

Accused Products

Subscription Required

Abstract

Citations

20 Claims

Specification

Subscription Required

Solutions

Use Cases

Quick Links