Method for finding global extrema of a set of bytes distributed across an array of parallel processing elements
First Claim
1. A method of optimizing the operation of an n-dimensional array of processing elements comprising:
- determining a local extrema for each of said processing elements;
serially outputting, on each clock cycle, said local extrema from each of said processing elements to a neighboring processing element until every processing element in a first dimension has received all local extrema along said first dimension;
determining within each of said processing elements a first dimensional extrema for said first dimension of said n-dimensional array, wherein said dimensional extrema is determined concurrently with the receipt of said local extrema from said processing elements in said first dimension;
serially outputting, on each clock cycle, said first dimensional extrema from each of said processing elements to a neighboring processing element until every processing element in a next dimension has received all first dimensional extrema along said next dimension;
determining within each of said processing elements a next dimensional extrema for a next dimension of said n-dimensional array, wherein said next dimensional extrema is determined concurrently with the receipt of said first dimensional extrema;
repeating said serially outputting on each clock cycle and concurrently determining within each of said processing elements a next dimensional extrema for each of said n-dimensions, wherein each of said next dimensional extrema is determined from a dimensional extrema from a previously selected dimension, until the global extrema is determined; and
saving said global extrema.
8 Assignments
0 Petitions
Accused Products
Abstract
A method for finding an extrema for an n-dimensional array having a plurality of processing elements, the method includes determining within each of the processing elements a dimensional extrema for a first dimension of the n-dimensional array, wherein the dimensional extrema is related to one or more local extrema of the processing elements in the first dimension, determining within each of the processing elements a next dimensional extrema for a next dimension of the n-dimensional array, wherein the next dimensional extrema is related to one or more of the first dimensional extrema, and repeating the determining within each of the processing elements a next dimensional extrema for each of the n-dimensions, wherein each of the next dimensional extrema is related to a dimensional extrema from a previously selected dimension.
399 Citations
19 Claims
-
1. A method of optimizing the operation of an n-dimensional array of processing elements comprising:
-
determining a local extrema for each of said processing elements; serially outputting, on each clock cycle, said local extrema from each of said processing elements to a neighboring processing element until every processing element in a first dimension has received all local extrema along said first dimension; determining within each of said processing elements a first dimensional extrema for said first dimension of said n-dimensional array, wherein said dimensional extrema is determined concurrently with the receipt of said local extrema from said processing elements in said first dimension; serially outputting, on each clock cycle, said first dimensional extrema from each of said processing elements to a neighboring processing element until every processing element in a next dimension has received all first dimensional extrema along said next dimension; determining within each of said processing elements a next dimensional extrema for a next dimension of said n-dimensional array, wherein said next dimensional extrema is determined concurrently with the receipt of said first dimensional extrema; repeating said serially outputting on each clock cycle and concurrently determining within each of said processing elements a next dimensional extrema for each of said n-dimensions, wherein each of said next dimensional extrema is determined from a dimensional extrema from a previously selected dimension, until the global extrema is determined; and saving said global extrema. - View Dependent Claims (2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12)
-
-
13. An n-dimensional array of processing elements, comprising:
-
a plurality of processing elements interconnected to form an n-dimensional array, each processing element comprising; an arithmetic logic unit; condition logic responsive to said arithmetic logic unit; a plurality of registers connected to a bus and responsive to said arithmetic logic unit; a result pipeline responsive to said arithmetic logic unit; an interface; and register files connected between said interface and said result pipeline;
said processing elements configured to;determine a local extrema; serially output, on each clock cycle, said local extrema to a neighboring processing element until every processing element in a first dimension has received all local extrema along said first dimension; determine a first dimensional extrema for said first dimension of said n-dimensional array, wherein said dimensional extrema is determined concurrently with the receipt of said local extrema from said processing elements in said first dimension; serially output, on each clock cycle, said first dimensional extrema to a neighboring processing element until every processing element in a next dimension has received all first dimensional extrema along said next dimension; determine a next dimensional extrema for a next dimension of said n-dimensional array, wherein said next dimensional extrema is determined concurrently with the receipt of said first dimensional extrema; repeat said serially outputting on each clock cycle and concurrently determining for each of said n-dimensions, wherein each of a next dimensional extrema is determined from a dimensional extrema from a previously selected dimension, until a global extrema is determined; and save said global extrema. - View Dependent Claims (14, 15, 16, 17, 18, 19)
-
Specification