System for parallel processing that compiles a filed sequence of instructions within an iteration space
First Claim
1. In a parallel processor digital data processing apparatus of the type havinga plurality of processing units, each for executing instructions,memory means coupled to said processing units for storing at least one of data and instructions,interprocessor communication means coupled to said processing units for transferring information therebetween,the improvement for processing an iterative sequence of instructions wherein:
- A. said memory means includes means for storing a tiled sequence of instructions representing the iterative sequence, and each of said plural processing units includes means for signalling its availability to execute said tiled sequence over a portion of an iteration space associated with said iterative sequence, said portion being referred to as a tile,B. said apparatus includes next-tile means coupled to said processing units for responding to each of at least selected such signallings by those processing units for generating a signal representing boundaries of a tile over which to execute said tiled sequence, wherein each such tile does not overlap any other tile, wherein each such tile covers a portion less than a whole of said iteration space, and wherein all such tiles together cover said iteration space,C. each said processing unit including means for responding to a boundary-representative signal generated in response to signalling by that processing unit for executing said tiled sequence over the corresponding tile.
1 Assignment
0 Petitions
Accused Products
Abstract
An improved parallel processing apparatus and method executes an iterative sequence of instructions by arranging the sequence into subtasks and allocating those subtasks to processors. This division and allocation is conducted in such a manner as to minimize data contention among the processors and to maximize the locality of data to them. The improved apparatus and method have application to a variety of multiprocessor systems, including those which are massively parallel.
-
Citations
72 Claims
-
1. In a parallel processor digital data processing apparatus of the type having
a plurality of processing units, each for executing instructions, memory means coupled to said processing units for storing at least one of data and instructions, interprocessor communication means coupled to said processing units for transferring information therebetween, the improvement for processing an iterative sequence of instructions wherein: -
A. said memory means includes means for storing a tiled sequence of instructions representing the iterative sequence, and each of said plural processing units includes means for signalling its availability to execute said tiled sequence over a portion of an iteration space associated with said iterative sequence, said portion being referred to as a tile, B. said apparatus includes next-tile means coupled to said processing units for responding to each of at least selected such signallings by those processing units for generating a signal representing boundaries of a tile over which to execute said tiled sequence, wherein each such tile does not overlap any other tile, wherein each such tile covers a portion less than a whole of said iteration space, and wherein all such tiles together cover said iteration space, C. each said processing unit including means for responding to a boundary-representative signal generated in response to signalling by that processing unit for executing said tiled sequence over the corresponding tile. - View Dependent Claims (2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 26)
-
-
27. In a compiler of the type for translating program signals representative of a program, including an iterative sequence, in source code format into a code sequence of instructions in an object code format suitable for loading for execution by a plurality of parallel digital data processing units,
the improvement comprising tiling means for responding to said iterative sequence in source code format for generating a tiled sequence of instructions representative thereof in object code format, and for further responding to that iterative sequence and to zero, one or more user-specified signals for generating one or more signals representing a framework for generating boundaries for plural tiles over which said tiled sequence is to be executed by said plurality of parallel digital data processing units, each such tile comprising a portion less than a whole of an iteration space associated with said iterative sequence.
-
37. In a method of operating a parallel processor digital data processing apparatus of the type having
a plurality of processing units, each for executing instructions, memory means coupled to said processing units for storing at least one of data and instructions, interprocessor communication means coupled to said processing units for transferring information therebetween, the improvement for processing an iterative sequence of instructions comprising the steps of: -
A. storing a tiled sequence of instructions representing the iterative sequence, B. signalling, for each of said plurality of processing units, its availability to execute said tiled sequence over a portion of an iteration space associated with said iterative sequence, said portion being referred to as a tile, C. responding to each of at least selected such signallings by those processing units for generating a signal representing boundaries of a tile over which to execute said tiled sequence, wherein each such tile does not overlap any other tile, wherein each such tile covers a portion less than a whole of said iteration space, and wherein all such tiles together cover said iteration space, and D. each said processor responding to a boundary-representative signal generated in response to signalling by that processing unit for executing said tiled sequence over the corresponding tile. - View Dependent Claims (38, 39, 40, 41, 42, 43, 44, 45, 46, 47, 48, 49, 50, 51, 52, 53, 54, 55, 56, 57, 58, 59, 60, 61, 62)
-
-
63. In a method for operating a compiler of the type for translating program signals representative of a program, including an iterative sequence, in source code format into a code sequence of instructions in an object code format suitable for loading for execution by a plurality of parallel digital data processing units,
the improvement comprising the steps of responding to said iterative sequence in source code format for generating a tiled sequence of instructions representative thereof in object code format, and further responding to that iterative sequence and to zero, one or more user-specified signals for generating one or more signals representing a framework for generating boundaries for plural tiles over which said tiled sequence is to be executed by said plurality of parallel digital data processing units, each such tile comprising a portion less than a whole of an iteration space associated with said iterative sequence.
Specification