×

Single-chip multiprocessor with cycle-precise program scheduling of parallel execution

  • US 7,143,401 B2
  • Filed: 02/20/2001
  • Issued: 11/28/2006
  • Est. Priority Date: 02/17/2000
  • Status: Expired due to Fees
First Claim
Patent Images

1. A method of static clock cycle scheduling of parallel execution of a program on a single chip multiprocessor including K explicit parallelism architecture processors each of which may decode no more than N operations each clock cycle, the single chip multiprocessor further including memory access means, first means of synchronization signal exchange between the processors, second means of register file and control registers data exchange between the K processors and third means providing for coherent state of internal processor caches, the method comprising:

  • compiling a source program with a clock cycle scheduling of the program execution so that a compiled source program is a set of consecutive super wide instructions each of which contains no more than M=KN operations;

    partitioning the compiled source program into no more than K parallel streams for a parallel execution on K processors of the single chip multiprocessor by means of partitioning of each super wide instruction into no more than K wide instructions, wherein each wide instruction contains no more than N operations and the partitioning of each super wide instruction minimizes data and control dependencies between specified K parallel streams;

    defining a sequence of execution of any two parallel streams in synchronization points using additional synchronization operations that are included with wide instructions of specified streams, the additional synchronization operations including “

    to permit execution of another parallel stream”

    (“

    permit”

    ), “

    to wait execution of another parallel stream”

    (“

    wait”

    ) and “

    to wait execution of another parallel stream then to permit this another parallel stream”

    (“

    wait and permit”

    ) and “

    to be indifferent to another parallel stream”

    (“

    indifferent”

    ), wherein;

    each “

    permit”

    operation of one stream has a proper “

    wait”

    operation in another stream of pair of streams and vice versa;



    permit”

    operation allows a proper “

    wait”

    operation completion in the another stream by “

    permit”

    signal transmitted to another processor through the first means;

    an execution of a current “

    permit”

    operation is locked by a lock signal received through the first means from a processor executing another stream when another processor does not complete proper “

    wait”

    operation for previous “

    permit”

    operation of its processor;

    an execution of “

    wait”

    operation is locked in its corresponding processor when a processor executing another stream does not complete the proper “

    permit”

    operation;



    wait and permit”

    operation consists of two operations “

    wait” and



    permit”

    carried out consistently; and



    indifferent”

    operation defines an indifferent relation of its processor to the state of processor which executes another stream in a given synchronization point;

    transferring data from register files and control registers between any of two processors of the single chip multiprocessor through the second means using an entrance FIFO buffer in each processor, wherein additional data exchange operations are included with wide instructions of specified streams, wherein;



    transmit”

    operation reads a register file from its corresponding processor and then transmits data from the register file through the second means into a FIFO buffer of another processor;



    receive”

    operation reads data received at an entrance FIFO buffer of its corresponding processor and then writes the data to a register of a register file or a control register of its corresponding processor;



    transmit”

    operation execution is locked by lock signal transmitted through the second means from another processor if the entrance FIFO buffer of the another processor has a full state; and



    receive”

    operation execution is locked if the entrance FIFO buffer of its corresponding processor has an empty state;

    providing a coherent state of all internal processor caches of the single chip multiprocessor by transmitting address and data of all store operations through the third means from all processors to all processors for a correction of the cache contents;

    mutually controlling the parallel execution of the streams from the streams themselves by transmitting of branch target addresses between all processors of single chip multiprocessor through specified second means; and

    executing in parallel specified streams on the multiple processors with synchronization, data and control information exchange.

View all claims
  • 1 Assignment
Timeline View
Assignment View
    ×
    ×