Data processing in a hybrid computing environment
First Claim
1. A method of data processing in a hybrid computing environment, the hybrid computing environment comprising a host computer having a host computer architecture, a plurality of accelerators having an accelerator architecture, wherein each of the plurality of accelerators has a dedicated portion of the local shared memory for the accelerators, the accelerator architecture optimized, with respect to the host computer architecture, for speed of execution of a particular class of computing functions, the host computer and the accelerators adapted to one another for data communications by a system level message passing module, the host computer having local memory shared remotely with the accelerators, the accelerators having local memory for the plurality of accelerators shared remotely with the host computer, the method comprising:
- assigning to each accelerator a rank in a logical tree and a specific function to perform, the results of the specific function to be stored in the accelerator'"'"'s dedicated portion;
performing, by the plurality of accelerators, a local reduction operation with the local shared memory for the accelerators, wherein performing the local reduction operation with the location shared memory for the accelerators comprises;
performing, by each accelerator, the accelerator'"'"'s assigned specific function including;
determining, by the accelerator in dependence upon the accelerator'"'"'s assigned rank, whether the accelerator is authorized to perform the accelerator'"'"'s assigned specific function; and
if the accelerator is authorized to perform the accelerator'"'"'s assigned specific function, performing, by the accelerator, the specific function and incrementing, by the accelerator, a counter; and
storing locally by each accelerator the results of the accelerator'"'"'s assigned specific function in the accelerator'"'"'s dedicated portion of the local shared memory for the accelerators;
writing remotely, by one of the plurality of accelerators to the shared memory local to the host computer, a result of the local reduction operation; and
reading, by the host computer from shared memory local to the host computer, the result of the local reduction operation.
1 Assignment
0 Petitions
Accused Products
Abstract
Data processing in a hybrid computing environment that includes a host computer, a plurality of accelerators, the host computer and the accelerators adapted to one another for data communications by a system level message passing module, the host computer having local memory shared remotely with the accelerators, the accelerators having local memory for the plurality of accelerators shared remotely with the host computer, where data processing according to embodiments of the present invention includes performing, by the plurality of accelerators, a local reduction operation with the local shared memory for the accelerators; writing remotely, by one of the plurality of accelerators to the shared memory local to the host computer, a result of the local reduction operation; and reading, by the host computer from shared memory local to the host computer, the result of the local reduction operation.
-
Citations
12 Claims
-
1. A method of data processing in a hybrid computing environment, the hybrid computing environment comprising a host computer having a host computer architecture, a plurality of accelerators having an accelerator architecture, wherein each of the plurality of accelerators has a dedicated portion of the local shared memory for the accelerators, the accelerator architecture optimized, with respect to the host computer architecture, for speed of execution of a particular class of computing functions, the host computer and the accelerators adapted to one another for data communications by a system level message passing module, the host computer having local memory shared remotely with the accelerators, the accelerators having local memory for the plurality of accelerators shared remotely with the host computer, the method comprising:
-
assigning to each accelerator a rank in a logical tree and a specific function to perform, the results of the specific function to be stored in the accelerator'"'"'s dedicated portion; performing, by the plurality of accelerators, a local reduction operation with the local shared memory for the accelerators, wherein performing the local reduction operation with the location shared memory for the accelerators comprises; performing, by each accelerator, the accelerator'"'"'s assigned specific function including; determining, by the accelerator in dependence upon the accelerator'"'"'s assigned rank, whether the accelerator is authorized to perform the accelerator'"'"'s assigned specific function; and if the accelerator is authorized to perform the accelerator'"'"'s assigned specific function, performing, by the accelerator, the specific function and incrementing, by the accelerator, a counter; and storing locally by each accelerator the results of the accelerator'"'"'s assigned specific function in the accelerator'"'"'s dedicated portion of the local shared memory for the accelerators; writing remotely, by one of the plurality of accelerators to the shared memory local to the host computer, a result of the local reduction operation; and reading, by the host computer from shared memory local to the host computer, the result of the local reduction operation. - View Dependent Claims (2, 3, 4)
-
-
5. A hybrid computing environment for data processing, the hybrid computing environment comprising a host computer having a host computer architecture, a plurality of accelerators having an accelerator architecture, wherein each of the plurality of accelerators has a dedicated portion of the local shared memory for the accelerators, the accelerator architecture optimized, with respect to the host computer architecture, for speed of execution of a particular class of computing functions, the host computer and the accelerators adapted to one another for data communications by a system level message passing module, the host computer having local memory shared remotely with the accelerators, the accelerators having local memory for the plurality of accelerators shared remotely with the host computer, the plurality of accelerators comprising computer program instructions capable of:
-
assigning to each accelerator a rank in a logical tree and a specific function to perform, the results of the specific function to be stored in the accelerator'"'"'s dedicated portion; performing, by the plurality of accelerators, a local reduction operation with the local shared memory for the accelerators, wherein performing the local reduction operation with the location shared memory for the accelerators comprises; performing, by each accelerator, the accelerator'"'"'s assigned specific function including; determining, by the accelerator in dependence upon the accelerator'"'"'s assigned rank, whether the accelerator is authorized to perform the accelerator'"'"'s assigned specific function; and if the accelerator is authorized to perform the accelerator'"'"'s assigned specific function, performing, by the accelerator, the specific function and incrementing, by the accelerator, a counter; and storing locally by each accelerator the results of the accelerator'"'"'s assigned specific function in the accelerator'"'"'s dedicated portion of the local shared memory for the accelerators; writing remotely, by one of the plurality of accelerators to the shared memory local to the host computer, a result of the local reduction operation; and the host computer comprising computer program instructions capable of reading, by the host computer from shared memory local to the host computer, the result of the local reduction operation. - View Dependent Claims (6, 7, 8)
-
-
9. A computer program product for data processing in a hybrid computing environment, the hybrid computing environment comprising a host computer having a host computer architecture, a plurality of accelerators having an accelerator architecture, wherein each of the plurality of accelerators has a dedicated portion of the local shared memory for the accelerators, the accelerator architecture optimized, with respect to the host computer architecture, for speed of execution of a particular class of computing functions, the host computer and the accelerators adapted to one another for data communications by a system level message passing module, the host computer having local memory shared remotely with the accelerators, the accelerators having local memory for the plurality of accelerators shared remotely with the host computer, the computer program product disposed in a computer readable, recordable storage medium, the computer program product comprising computer program instructions capable of:
-
assigning to each accelerator a rank in a logical tree and a specific function to perform, the results of the specific function to be stored in the accelerator'"'"'s dedicated portion; performing, by the plurality of accelerators, a local reduction operation with the local shared memory for the accelerators, wherein performing the local reduction operation with the location shared memory for the accelerators comprises; performing, by each accelerator, the accelerator'"'"'s assigned specific function including; determining, by the accelerator in dependence upon the accelerator'"'"'s assigned rank, whether the accelerator is authorized to perform the accelerator'"'"'s assigned specific function; and if the accelerator is authorized to perform the accelerator'"'"'s assigned specific function, performing, by the accelerator, the specific function and incrementing, by the accelerator, a counter; and storing locally by each accelerator the results of the accelerator'"'"'s assigned specific function in the accelerator'"'"'s dedicated portion of the local shared memory for the accelerators; writing remotely, by one of the plurality of accelerators to the shared memory local to the host computer, a result of the local reduction operation; and reading, by the host computer from shared memory local to the host computer, the result of the local reduction operation. - View Dependent Claims (10, 11, 12)
-
Specification