Single-chip multiprocessor with cycle-precise program scheduling of parallel execution

US 7,143,401 B2
Filed: 02/20/2001
Issued: 11/28/2006
Est. Priority Date: 02/17/2000
Status: Expired due to Fees

First Claim

Patent Images

1. A method of static clock cycle scheduling of parallel execution of a program on a single chip multiprocessor including K explicit parallelism architecture processors each of which may decode no more than N operations each clock cycle, the single chip multiprocessor further including memory access means, first means of synchronization signal exchange between the processors, second means of register file and control registers data exchange between the K processors and third means providing for coherent state of internal processor caches, the method comprising:

compiling a source program with a clock cycle scheduling of the program execution so that a compiled source program is a set of consecutive super wide instructions each of which contains no more than M=KN operations;

partitioning the compiled source program into no more than K parallel streams for a parallel execution on K processors of the single chip multiprocessor by means of partitioning of each super wide instruction into no more than K wide instructions, wherein each wide instruction contains no more than N operations and the partitioning of each super wide instruction minimizes data and control dependencies between specified K parallel streams;

defining a sequence of execution of any two parallel streams in synchronization points using additional synchronization operations that are included with wide instructions of specified streams, the additional synchronization operations including “

to permit execution of another parallel stream”

(“

permit”

), “

to wait execution of another parallel stream”

(“

wait”

) and “

to wait execution of another parallel stream then to permit this another parallel stream”

(“

wait and permit”

) and “

to be indifferent to another parallel stream”

(“

indifferent”

), wherein;

each “

permit”

operation of one stream has a proper “

wait”

operation in another stream of pair of streams and vice versa;

“

permit”

operation allows a proper “

wait”

operation completion in the another stream by “

permit”

signal transmitted to another processor through the first means;

an execution of a current “

permit”

operation is locked by a lock signal received through the first means from a processor executing another stream when another processor does not complete proper “

wait”

operation for previous “

permit”

operation of its processor;

an execution of “

wait”

operation is locked in its corresponding processor when a processor executing another stream does not complete the proper “

permit”

operation;

“

wait and permit”

operation consists of two operations “

wait” and

“

permit”

carried out consistently; and

“

indifferent”

operation defines an indifferent relation of its processor to the state of processor which executes another stream in a given synchronization point;

transferring data from register files and control registers between any of two processors of the single chip multiprocessor through the second means using an entrance FIFO buffer in each processor, wherein additional data exchange operations are included with wide instructions of specified streams, wherein;

“

transmit”

operation reads a register file from its corresponding processor and then transmits data from the register file through the second means into a FIFO buffer of another processor;

“

receive”

operation reads data received at an entrance FIFO buffer of its corresponding processor and then writes the data to a register of a register file or a control register of its corresponding processor;

“

transmit”

operation execution is locked by lock signal transmitted through the second means from another processor if the entrance FIFO buffer of the another processor has a full state; and

“

receive”

operation execution is locked if the entrance FIFO buffer of its corresponding processor has an empty state;

providing a coherent state of all internal processor caches of the single chip multiprocessor by transmitting address and data of all store operations through the third means from all processors to all processors for a correction of the cache contents;

mutually controlling the parallel execution of the streams from the streams themselves by transmitting of branch target addresses between all processors of single chip multiprocessor through specified second means; and

executing in parallel specified streams on the multiple processors with synchronization, data and control information exchange.

View all claims

1 Assignment

Timeline View

Assignment View

0 Petitions

Accused Products

Abstract

A single-chip multiprocessor system and operation method of this system based on a static macro-scheduling of parallel streams for multiprocessor parallel execution. The single-chip multiprocessor system has buses for direct exchange between the processor register files and access to their store addresses and data. Each explicit parallelism architecture processor of this system has an interprocessor interface providing the synchronization signals exchange, data exchange at the register file level and access to store addresses and data of other processors. The single-chip multiprocessor system uses ILP to increase the performance. Synchronization of the streams parallel execution is ensured using special operations setting a sequence of streams and stream fragments execution prescribed by the program algorithm.

45 Citations

View as Search Results

6 Claims

1. A method of static clock cycle scheduling of parallel execution of a program on a single chip multiprocessor including K explicit parallelism architecture processors each of which may decode no more than N operations each clock cycle, the single chip multiprocessor further including memory access means, first means of synchronization signal exchange between the processors, second means of register file and control registers data exchange between the K processors and third means providing for coherent state of internal processor caches, the method comprising:
- compiling a source program with a clock cycle scheduling of the program execution so that a compiled source program is a set of consecutive super wide instructions each of which contains no more than M=KN operations;
  
  partitioning the compiled source program into no more than K parallel streams for a parallel execution on K processors of the single chip multiprocessor by means of partitioning of each super wide instruction into no more than K wide instructions, wherein each wide instruction contains no more than N operations and the partitioning of each super wide instruction minimizes data and control dependencies between specified K parallel streams;
  
  defining a sequence of execution of any two parallel streams in synchronization points using additional synchronization operations that are included with wide instructions of specified streams, the additional synchronization operations including “
  
  to permit execution of another parallel stream”
  
  (“
  
  permit”
  
  ), “
  
  to wait execution of another parallel stream”
  
  (“
  
  wait”
  
  ) and “
  
  to wait execution of another parallel stream then to permit this another parallel stream”
  
  (“
  
  wait and permit”
  
  ) and “
  
  to be indifferent to another parallel stream”
  
  (“
  
  indifferent”
  
  ), wherein;
  
  each “
  
  permit”
  
  operation of one stream has a proper “
  
  wait”
  
  operation in another stream of pair of streams and vice versa;
  
  “
  
  permit”
  
  operation allows a proper “
  
  wait”
  
  operation completion in the another stream by “
  
  permit”
  
  signal transmitted to another processor through the first means;
  
  an execution of a current “
  
  permit”
  
  operation is locked by a lock signal received through the first means from a processor executing another stream when another processor does not complete proper “
  
  wait”
  
  operation for previous “
  
  permit”
  
  operation of its processor;
  
  an execution of “
  
  wait”
  
  operation is locked in its corresponding processor when a processor executing another stream does not complete the proper “
  
  permit”
  
  operation;
  
  “
  
  wait and permit”
  
  operation consists of two operations “
  
  wait” and
  
  “
  
  permit”
  
  carried out consistently; and
  
  “
  
  indifferent”
  
  operation defines an indifferent relation of its processor to the state of processor which executes another stream in a given synchronization point;
  
  transferring data from register files and control registers between any of two processors of the single chip multiprocessor through the second means using an entrance FIFO buffer in each processor, wherein additional data exchange operations are included with wide instructions of specified streams, wherein;
  
  “
  
  transmit”
  
  operation reads a register file from its corresponding processor and then transmits data from the register file through the second means into a FIFO buffer of another processor;
  
  “
  
  receive”
  
  operation reads data received at an entrance FIFO buffer of its corresponding processor and then writes the data to a register of a register file or a control register of its corresponding processor;
  
  “
  
  transmit”
  
  operation execution is locked by lock signal transmitted through the second means from another processor if the entrance FIFO buffer of the another processor has a full state; and
  
  “
  
  receive”
  
  operation execution is locked if the entrance FIFO buffer of its corresponding processor has an empty state;
  
  providing a coherent state of all internal processor caches of the single chip multiprocessor by transmitting address and data of all store operations through the third means from all processors to all processors for a correction of the cache contents;
  
  mutually controlling the parallel execution of the streams from the streams themselves by transmitting of branch target addresses between all processors of single chip multiprocessor through specified second means; and
  
  executing in parallel specified streams on the multiple processors with synchronization, data and control information exchange.
- View Dependent Claims (2, 3, 4, 5, 6)
- - 2. The method of claim 1, wherein synchronization between any two streams is performed not only for each event but also for a number of events using hardware reverse event counters in each processor of the single chip multiprocessor to register the number of the events between the pair of streams, wherein:
    - “
      
      permit”
      
      operation completion increments the reverse event counter of a processor executing another stream by “
      
      permit”
      
      signal sent through the first means;
      
      “
      
      permit”
      
      operation completes if a current contents of the reverse event counter of a processor executing another stream is less than its highest possible value;
      
      “
      
      permit”
      
      operation execution is locked by lock signal received through the specified first means from a processor executing another stream if a current contents of the reverse event counter of another processor is equal to its highest possible value;
      
      “
      
      wait”
      
      operation completion decrements given reverse event counter of its corresponding processor;
      
      “
      
      wait”
      
      operation completes if a current contents of given reverse event counter of its corresponding processor is bigger than zero; and
      
      “
      
      wait”
      
      operation execution is locked if a current contents of given reverse event counter of its corresponding processor is equal to zero.
  - 3. The method of claim 2, wherein any synchronization point of any stream may define a set of different or equal synchronization operations for the set of another streams and where each processor of the single chip multiprocessor contains at least K−
    - 1 hardware reverse event counters for event counting from another K−
      
      1 processors.
  - 4. The method of claim 1, wherein any “
    - transmit”
      
      data exchange operation may define a set of the processors of the single chip multiprocessor for data receiving, any “
      
      receive”
      
      operation defines only any one processor transmitting data for it and in that case each processor of the single chip multiprocessor contains at least K−
      
      1 entrance FIFO buffers for data receiving from another K−
      
      1 processors.
  - 5. The method of claim 1, where in the synchronization operations may be included in any wide instruction of any stream with the purpose of synchronizing parallel execution of the streams.
  - 6. The method of claim 1, where in the synchronization operations may be included in all wide instructions of all streams with the purpose of realizing a clock cycle synchronization of parallel execution of the streams.

Specification

Resources

Litigation Campaign Assessment

Current Assignee
Elbrus International Limited
Original Assignee
Elbrus International Limited
Inventors
Nazarov, Leonid N., Gruzdov, Feodor A., Rozhkov, Sergey A., Tikhorsky, Vladimir V., Chudakov, Mikhail L., Sakhin, Yuli Kh., Volkonskiy, Vladimir Yu., Babaian, Boris A.
Primary Examiner(s)
Chaki, Kakali
Assistant Examiner(s)
Vu, Tuan A

Application Number

US09/789,850
Publication Number

US 20010042189A1
Time in Patent Office

2,107 Days
Field of Search

717144-152, 717/119, 717/161, 717/116, 709/400, 709/248, 712 20- 24, 712/203, 712/234, 718102-104, 711/220
US Class Current

717/149
CPC Class Codes

G06F 9/5066 Algorithms for mapping a pl...

G06F 9/52 Program synchronisation; Mu...

Single-chip multiprocessor with cycle-precise program scheduling of parallel execution

First Claim

1 Assignment

0 Petitions

Accused Products

Abstract

45 Citations

6 Claims

Specification

Use Cases

Quick Links

Others

Single-chip multiprocessor with cycle-precise program scheduling of parallel execution

First Claim

1 Assignment

Subscription Required

Subscription Required

0 Petitions

Subscription Required

Accused Products

Subscription Required

Abstract

45 Citations

6 Claims

Specification

Subscription Required

Use Cases

Quick Links

Others