Single-chip multiprocessor with clock cycle-precise program scheduling of parallel execution
0 Assignments
0 Petitions
Accused Products
Abstract
A single-chip multiprocessor system and operation method of this system based on a static macro-scheduling of parallel streams for multiprocessor parallel execution. The single-chip multiprocessor system has buses for direct exchange between the processor register files and access to their store addresses and data. Each explicit parallelism architecture processor of this system has an interprocessor interface providing the synchronization signals exchange, data exchange at the register file level and access to store addresses and data of other processors. The single-chip multiprocessor system uses ILP to increase the performance. Synchronization of the streams parallel execution is ensured using special operations setting a sequence of streams and stream fragments execution prescribed by the program algorithm.
-
Citations
37 Claims
-
1-16. -16. (canceled)
-
17. A single-chip multiprocessor for executing programs compiled using a static clock-precise macro-scheduling program such that a compiled source program is a set of consecutive super-wide instructions, comprising:
-
K explicit parallelism architecture processors each of which may decode no more than N operations each clock cycle, wherein the K processors are configured to execute no more than K parallel streams, wherein the K parallel streams are created by partitioning each super-wide instruction into no more than K wide instructions, and wherein each wide instruction contains no more than N operations and belongs to only one parallel stream; and
a synchronization signal exchanger between the processors, wherein the synchronization signal exchanger transmits “
permit”
signals that control a sequence of execution of any two parallel streams, and wherein specified streams include synchronization points having synchronization operations that are included with wide instructions of specified streams, the synchronization operations including “
to permit execution of another parallel stream”
(“
permit”
) and “
to wait execution of another parallel stream”
(“
wait”
),wherein “
permit” and
“
wait”
operations are defined from each processor to every other processor, wherein a first processor having a first “
permit”
operation at a first synchronization point is configured to send a first “
permit”
signal to a second processor, and wherein a second processor having a corresponding “
wait”
operation at a different synchronization point is configured to check a presence of the first “
permit”
signal from the first processor and stop execution while the first “
permit”
signal is absent,wherein K and N are integers, and wherein K is greater than one. - View Dependent Claims (18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28, 29, 30)
-
-
31. A multi-chip multiprocessor system comprising:
-
a plurality of single-chip multiprocessors, wherein at least one of the single-chip multiprocessors executes programs compiled using a static clock-precise macro-scheduling program such that a compiled source program is a set of consecutive super-wide instructions, wherein the at least one single-chip multiprocessor includes;
K explicit parallelism architecture processors each of which may decode no more than N operations each clock cycle, wherein the K processors are configured to execute no more than K parallel streams, wherein the K parallel streams are created by partitioning each super-wide instruction into no more than K wide instructions, wherein each wide instruction contains no more than N operations and belongs to only one parallel stream; and
a synchronization signal exchanger between the processors, wherein the synchronization signal exchanger transmits “
permit”
signals that control a sequence of execution of any two parallel streams, and wherein specified streams include synchronization points having synchronization operations that are included with wide instructions of specified streams, the synchronization operations including “
to permit execution of another parallel stream”
(“
permit”
) and “
to wait execution of another parallel stream”
(“
wait”
),wherein “
permit” and
“
wait”
operations are defined from each processor to every other processor, wherein a first processor having a first “
permit”
operation at a first synchronization point is configured to send a first “
permit”
signal to a second processor, and wherein a second processor having a corresponding “
wait”
operation is configured to check a presence of the first “
permit”
signal from the first processor and stop execution while the first “
permit”
signal is absent,wherein K and N are integers, and wherein K is greater than one.
-
-
32. A single-chip multiprocessor for executing programs compiled using a static clock-precise macro-scheduling program such that a compiled source program is a set of consecutive super-wide instructions, comprising:
-
K explicit parallelism architecture processors each of which may decode no more than N operations each clock cycle, wherein the K processors are configured to execute no more than K parallel streams, wherein the K parallel streams are created by partitioning each super-wide instruction into no more than K wide instructions, and wherein each wide instruction contains no more than N operations and belongs to only one parallel stream; and
a register exchanger for providing data exchange between the streams at a processor register file or control register level using special exchange processor operations added in wide instructions of specified streams, wherein the exchange processor operations include;
“
transmit”
operations defined between each processor, wherein the “
transmit”
operations send data from a given internal processor register to respective other processors; and
“
receive”
operations defined for each processor, wherein the “
receive”
operations read data accepted from the respective other processor and lock the processor from running while transmitted data is absent,wherein K and N are integers, and wherein K is greater than one. - View Dependent Claims (33, 34, 35)
-
-
36. A single-chip multiprocessor for executing programs compiled using a static clock-precise macro-scheduling program such that a compiled source program is a set of consecutive super-wide instructions, comprising:
-
K explicit parallelism architecture processors each of which may decode no more than N operations each clock cycle, wherein the K processors are configured to execute no more than K parallel streams, wherein the K parallel streams are created by partitioning each super-wide instruction into no more than K wide instructions, wherein each wide instruction contains no more than N operations and belongs to only one parallel stream, wherein K and N are integers, and wherein K is greater than one;
a synchronization signal exchanger between the processors, wherein the synchronization signal exchanger transmits “
permit” and
lock signals that control a sequence of execution of any two parallel streams having synchronization points and synchronization operations included with wide instructions of specified streams, the synchronization operations including “
to permit execution of another parallel stream”
(“
permit”
) and “
to wait execution of another parallel stream”
(“
wait”
),wherein “
permit” and
“
wait”
operations are defined from each processor to every other processor, wherein a first processor having a first “
permit”
operation at a first synchronization point is configured to send a first “
permit”
signal to a second processor, and wherein a second processor having a corresponding “
wait”
operation at a different synchronization point is configured to check a presence of the first “
permit”
signal from the first processor and stop execution while the first “
permit”
signal is absent, andwherein an execution of a “
permit”
operation by a processor executing one stream is locked by a first lock signal received through the synchronization signal exchanger from another processor executing another stream when the another processor has not completed a “
wait”
operation corresponding to a “
permit”
operation previously received by the another processor;
a register exchanger for providing data exchange between the streams at a processor register file and control register level using special exchange processor operations “
transmit” and
“
receive”
added in wide instructions of specified streams, wherein the register exchanger transfers data between registers of two processors of the single-chip multiprocessor using an entrance FIFO buffer in each processor, wherein;
a “
transmit”
operation reads data from a register of a processor and then transmits this data from the register through the register exchanger to a FIFO buffer of another processor;
a “
receive”
operation reads data received at an entrance FIFO buffer of a processor and then writes this data to a register of that processor;
a “
transmit”
operation execution on a processor is locked by a second lock signal transmitted through the register exchanger from another processor if the entrance FIFO buffer of the another processor has a full state; and
a “
receive”
operation execution on a processor is locked if the entrance FIFO buffer of that processor has an empty state;
a cache exchanger for providing for coherent state of internal processor caches of the single-chip multiprocessor, wherein the cache exchanger transmits address and data of all store operations from all processors to all other processors for a correction of the cache contents; and
wherein the synchronization signal exchanger includes a multiple connection bus for transfer of synchronization signals, wherein the register exchanger includes a multiple connection bus for transfer of register files and control registers data, and wherein the cache exchanger includes a multiple connection bus for transfer of store addresses and stored data. - View Dependent Claims (37)
-
Specification