Single-chip multiprocessor with cycle-precise program scheduling of parallel execution
First Claim
1. A method of static macro-scheduling of parallel execution of programs on a single-chip multiprocessor with explicit parallelism architecture processors, the method comprising:
- scheduling of parallel program execution on multiple processors in a static clock-precise manner;
dividing the scheduled program into parallel streams in order to minimize data and control dependencies between different streams, wherein the number of instructions in a group for parallel execution in each stream does not exceed the abilities of the processor intended for parallel execution of a given stream;
defining a sequence of execution in synchronization points of each pair of streams as “
one later than other,”
“
one earlier than other” and
“
simultaneous”
;
executing the sequence of execution in the synchronization points;
directly exchanging data and address information between different streams at a register file level and data cache level; and
mutually controlling the streams from the executed program.
1 Assignment
0 Petitions
Accused Products
Abstract
A single-chip multiprocessor system and operation method of this system based on a static macro-scheduling of parallel streams for multiprocessor parallel execution. The single-chip multiprocessor system has buses for direct exchange between the processor register files and access to their store addresses and data. Each explicit parallelism architecture processor of this system has an interprocessor interface providing the synchronization signals exchange, data exchange at the register file level and access to store addresses and data of other processors. The single-chip multiprocessor system uses ILP to increase the performance. Synchronization of the streams parallel execution is ensured using special operations setting a sequence of streams and stream fragments execution prescribed by the program algorithm.
40 Citations
16 Claims
-
1. A method of static macro-scheduling of parallel execution of programs on a single-chip multiprocessor with explicit parallelism architecture processors, the method comprising:
-
scheduling of parallel program execution on multiple processors in a static clock-precise manner;
dividing the scheduled program into parallel streams in order to minimize data and control dependencies between different streams, wherein the number of instructions in a group for parallel execution in each stream does not exceed the abilities of the processor intended for parallel execution of a given stream;
defining a sequence of execution in synchronization points of each pair of streams as “
one later than other,”
“
one earlier than other” and
“
simultaneous”
;
executing the sequence of execution in the synchronization points;
directly exchanging data and address information between different streams at a register file level and data cache level; and
mutually controlling the streams from the executed program. - View Dependent Claims (2, 3, 4, 5, 6, 7, 8, 9)
-
-
10. A single chip multiprocessor system with explicit parallelism architecture processors for executing programs compiled using a static clock-precise macro-scheduling program and dividing schedules into streams for parallel execution on different processors, ensuring data exchange at a register file level, address and data exchange at a data cache level between the processors, as well as synchronization of parallel processor execution using special operations and control signals connecting all processors, the system comprising:
-
a plurality of explicit parallelism architecture processors, each processor including an instruction cache, control unit, multiple execution units, register file, data cache, memory control unit, array access unit, predicate file, bypass bus, and an interprocessor exchange subsystem for exchanging data, address information and signals controlling synchronization of the parallel processor execution, the interprocessor exchange subsystem comprising;
synchronization means for parallel processor operation including means for signal issue permitting synchronization operation execution in other processors and means for receiving other processor signals permitting synchronization operation execution in its respective processor;
means for data exchange between the register files, including means for data issue from the processor registers to other processors and means for receiving data from other processors for writing them into the registers of its own processor; and
means for address and data exchange to support processor cache coherence, including means for address and data transfer to other processors during execution of a store operation and a buffer for receiving and temporary storage of addresses and data from other processors, reviewed by all memory access operations of its respective processor;
interprocessor interfaces including multiple communication buses between the processors for transfer of data, address information and signals controlling synchronization of the parallel processor execution;
a system interface unit;
a shared cache unit; and
a plurality of units including I/O controllers, memory controllers, co-processors. - View Dependent Claims (11, 12, 13, 14, 15)
-
-
16. A multi-chip multiprocessor system comprising a multitude of single-chip multiprocessors based on explicit parallelism architecture processors for executing programs compiled using static clock-precise macro-scheduling of a program and dividing a schedule into streams for parallel execution on different processors, ensuring interaction between the single-chip multiprocessors by means of a system interface and main memory and interaction between the processors of the single-chip multiprocessor by means of data exchange at a register file level, address and data exchange at a data cache level, and synchronization of parallel processor execution using special operations and control signals connecting all processors of the single-chip multiprocessor.
Specification