Boundary synchronization mechanism for a processor of a systolic array
First Claim
1. A system for synchronizing instruction code executed by a processor of a processing engine in an intermediate network station, the processing engine configured as a systolic array having a plurality of processors arrayed as rows and columns, each processor of a column executing similar instruction code, the system comprising:
- a temporal synchronization mechanism associated with each processor of the array, the temporal synchronization mechanism including boundary logic configured to specify an offset from one of a start of a phase and relative to a marker at which execution of an instruction may be delayed.
1 Assignment
0 Petitions
Accused Products
Abstract
A mechanism synchronizes instruction code executing on a processor of a processing engine in an intermediate network station. The processing engine is configured as a systolic array having a plurality of processors arrayed as rows and columns. The mechanism comprises a boundary (temporal) synchronization mechanism for cycle-based synchronization within a processor of the array. The synchronization mechanism is generally implemented using specialized synchronization micro operation codes (“opcodes”).
26 Citations
20 Claims
-
1. A system for synchronizing instruction code executed by a processor of a processing engine in an intermediate network station, the processing engine configured as a systolic array having a plurality of processors arrayed as rows and columns, each processor of a column executing similar instruction code, the system comprising:
a temporal synchronization mechanism associated with each processor of the array, the temporal synchronization mechanism including boundary logic configured to specify an offset from one of a start of a phase and relative to a marker at which execution of an instruction may be delayed. - View Dependent Claims (2, 3, 4)
-
5. A system for synchronizing instruction code executed by a processor of a processing engine in an intermediate network station, the processing engine configured as a systolic array having a plurality of processors arrayed as rows and columns, each processor of a column executing similar instruction code, the system comprising:
-
a temporal synchronization mechanism associated with each processor of the array, the temporal synchronization mechanism including boundary logic configured to specify an offset from one of a start of a phase and relative to a marker at which execution of an instruction may be delayed; boundary logic having a count register of the processor, the count register containing a timer value indicating a number of clock cycles that have elapsed since the start of phase; a boundary reference threshold register containing a threshold value indicating an allowable duration for execution of a code sequence on the processor; said each processor compares the timer value with the threshold value to determine whether further program instruction code execution can proceed without incurring stall cycles; and a boundary reference (bref) instruction opcode that selectively resets the timer value to identify start of a new synchronization interval and that sets the threshold value used for comparison with the timer value. - View Dependent Claims (6, 7)
-
-
8. A method for synchronizing instruction code executed by a processor of a processing engine in an intermediate network station, the processing engine configured as a systolic array having a plurality of processors arrayed as rows and columns, each processor of a column executing similar instruction code, the method comprising the steps of:
-
allowing execution of a code sequence on the processor for a minimum duration specified by a duration of a duration register; if a code path executed by the processor completes before the specified duration, stalling the processor until the duration has elapsed; and if a code path execution time is greater than or equal to the specified duration, continuing execution of the code sequence on the processor without incurring stall cycles.
-
-
9. A method for synchronizing instruction code executed by a processor of a processing engine in an intermediate network station, the processing engine configured as a systolic array having a plurality of processors arrayed as rows and columns, each processor of a column executing similar instruction code, the method comprising the steps of:
-
providing a synchronization operation code (opcode) adapted for execution by the processor; interpreting the synchronization opcode based on a state of a predetermined bit of a machine state register within the processor; and performing temporal synchronization when the predetermined bit is non-asserted. - View Dependent Claims (10)
-
-
11. A method for synchronizing instruction code within a processor of a multiprocessor system configured as a systolic array, the array having a plurality of processors arrayed as rows and columns, each processor of a column executing similar instruction code, the method comprising the steps of:
-
initiating a cycle timer value responsive to the processor; setting a threshold value to a specified number of cycles; and suspending further execution of the instruction code until the timer value is greater than or equal to the threshold value. - View Dependent Claims (12)
-
-
13. A method for synchronizing instruction code within a
processor of a multiprocessor system configured as a systolic array, the array having a plurality of processors arrayed as rows and columns, each processor of a column executing similar instruction code, the method comprising the steps of: -
initiating a cycle timer value responsive to the processor; setting a threshold value to a specified number of cycles; suspending further execution of the instruction code until the timer value is greater than or equal to the threshold value; resetting the timer value at a beginning of a phase; selectively resetting the timer value one or more times during the phase by executing a first instruction; setting the threshold value by executing a second instruction. - View Dependent Claims (14)
-
-
15. Apparatus adapted to synchronize instruction code within a processor of a multiprocessor system configured as a systolic array, the array having a plurality of processors arrayed as rows and columns, each processor of a column executing similar instruction code, the apparatus comprising:
-
means for initiating a cycle timer value responsive to the processor; means for setting a threshold value to a specified number of cycles; and means for suspending further execution of the instruction code until the timer value is greater than or equal to the threshold value.
-
-
16. Apparatus adapted to synchronize instruction code within a processor of a multiprocessor system configured as a systolic array, the array having a plurality of processors arrayed as rows and columns, each processor of a column executing similar instruction code, the apparatus comprising:
-
means for initiating a cycle timer value responsive to the processor; means for setting a threshold value to a specified number of cycles; means for suspending further execution of the instruction code until the timer value is greater than or equal to the threshold value; means for resetting the timer value at a beginning of a phase; and means for selectively resetting the timer value one or more times during the phase by executing a first instruction. - View Dependent Claims (17, 18)
-
-
19. A computer readable medium containing executable program instructions for synchronizing instruction code executed by a processor of a processing engine in an intermediate network station, the processing engine configured as a systolic array having a plurality of processors arrayed as rows and columns, each processor of a column executing similar instruction code, the executable program instructions comprising program instructions for:
-
providing a synchronization operation code (opcode) adapted for execution by the processor; interpreting the synchronization opcode based on a state of a predetermined bit of a machine state register within the processor; and performing temporal synchronization when the predetermined bit is non-asserted. - View Dependent Claims (20)
-
Specification