Synchronization in a Multi-Tile, Multi-Chip Processing Arrangement
First Claim
1. A method of operating a system comprising a plurality of processor tiles divided into a plurality of domains wherein each domain has inter-tile connections via a time-deterministic interconnect, and inter-domain tile connections are made via a non-time-deterministic interconnect;
- the method comprising;
performing a first compute phase at a first tile in a first domain;
performing a second compute phase at a second tile in the first domain;
performing an internal barrier synchronization within the first domain to require that the first tile has completed the first compute phase and the second tile has completed the second compute phase before proceeding to a first internal exchange phase at the first domain;
following the internal barrier synchronization, performing the first internal exchange phase between the first tile and the second tile within the first domain, in which the first tile communicates results of its computations to the second tile via the time-deterministic interconnect, wherein the first internal exchange phase does not include communicating computation results from the first domain to a second domain;
performing an external barrier synchronization to require the first tile and the second tile of the first domain have completed the first internal exchange phase and a third tile of the second domain has completed a second internal exchange phase at the second domain before any of the first tile, the second tile, or the third tile is allowed to proceed to an external exchange phase; and
following the external barrier synchronization, performing the external exchange phase in which the first tile communicates results of its computations with the third tile via the non-time-deterministic interconnect.
1 Assignment
0 Petitions
Accused Products
Abstract
A method of operating a system comprising multiple processor tiles divided into a plurality of domains wherein within each domain the tiles are connected to one another via a respective instance of a time-deterministic interconnect and between domains the tiles are connected to one another via a non-time-deterministic interconnect. The method comprises: performing a compute stage, then performing a respective internal barrier synchronization within each domain, then performing an internal exchange phase within each domain, then performing an external barrier synchronization to synchronize between different domains, then performing an external exchange phase between the domains.
-
Citations
20 Claims
-
1. A method of operating a system comprising a plurality of processor tiles divided into a plurality of domains wherein each domain has inter-tile connections via a time-deterministic interconnect, and inter-domain tile connections are made via a non-time-deterministic interconnect;
- the method comprising;
performing a first compute phase at a first tile in a first domain; performing a second compute phase at a second tile in the first domain; performing an internal barrier synchronization within the first domain to require that the first tile has completed the first compute phase and the second tile has completed the second compute phase before proceeding to a first internal exchange phase at the first domain; following the internal barrier synchronization, performing the first internal exchange phase between the first tile and the second tile within the first domain, in which the first tile communicates results of its computations to the second tile via the time-deterministic interconnect, wherein the first internal exchange phase does not include communicating computation results from the first domain to a second domain; performing an external barrier synchronization to require the first tile and the second tile of the first domain have completed the first internal exchange phase and a third tile of the second domain has completed a second internal exchange phase at the second domain before any of the first tile, the second tile, or the third tile is allowed to proceed to an external exchange phase; and following the external barrier synchronization, performing the external exchange phase in which the first tile communicates results of its computations with the third tile via the non-time-deterministic interconnect. - View Dependent Claims (2, 3, 4, 5, 6, 7, 8, 9)
- the method comprising;
-
10. At least one non-transitory computer-readable storage having encoded thereon code configured so as when executed on a plurality of tiles performs operations including:
-
perform a first compute phase at a first tile in a first domain; perform a second compute phase at a second tile in the first domain; perform an internal barrier synchronization within the first domain to require that the first tile has completed the first compute phase and the second tile has completed the second compute phase before proceeding to a first internal exchange phase at the first domain; following the internal barrier synchronization, perform the first internal exchange phase between the first tile and the second tile within the first domain, in which the first tile communicates results of its computations to the second tile via a time-deterministic interconnect, wherein the first internal exchange phase does not include communicating computation results from the first domain to a second domain; perform an external barrier synchronization to require the first tile and the second tile of the first domain have completed the first internal exchange phase and a third tile of the second domain has completed a second internal exchange phase at the second domain before any of the first tile, the second tile, or the third tile is allowed to proceed to an external exchange phase; and following the external barrier synchronization, perform the external exchange phase in which the first tile communicates results of its computations with the third tile via a non-time-deterministic interconnect between the first domain and the second domain. - View Dependent Claims (11, 12, 13, 14, 15, 16, 17)
-
-
18. A system comprising a plurality of processor tiles divided into a plurality of domains wherein each domain has inter-tile connections via a time-deterministic interconnect, and inter-domain tile connections are made via a non-time-deterministic interconnect;
- the system being programmed to perform operations of;
performing a first compute phase at a first tile in a first domain; performing a second compute phase at a second tile in the first domain; performing an internal barrier synchronization within the first domain to require that the first tile has completed the first compute phase and the second tile has completed the second compute phase before proceeding to a first internal exchange phase at the first domain; following the internal barrier synchronization, performing the first internal exchange phase between the first tile and the second tile within the first domain, in which the first tile communicates results of its computations to the second tile via the time-deterministic interconnect, wherein the first internal exchange phase does not include communicating computation results from the first domain to a second domain; performing an external barrier synchronization to require the first tile and the second tile of the first domain have completed the first internal exchange phase and a third tile of the second domain has completed a second internal exchange phase at the second domain before any of the first tile, the second tile, or the third tile is allowed to proceed to an external exchange phase; and following the external barrier synchronization, performing the external exchange phase in which the first tile communicates results of its computations with the third tile via the non-time-deterministic interconnect. - View Dependent Claims (19, 20)
- the system being programmed to perform operations of;
Specification