Data processing in a hybrid computing environment

US 9,170,864 B2
Filed: 01/29/2009
Issued: 10/27/2015
Est. Priority Date: 01/29/2009
Status: Expired due to Fees

First Claim

Patent Images

1. A method of data processing in a hybrid computing environment, the hybrid computing environment comprising a host computer having a host computer architecture, a plurality of accelerators having an accelerator architecture, wherein each of the plurality of accelerators has a dedicated portion of the local shared memory for the accelerators, the accelerator architecture optimized, with respect to the host computer architecture, for speed of execution of a particular class of computing functions, the host computer and the accelerators adapted to one another for data communications by a system level message passing module, the host computer having local memory shared remotely with the accelerators, the accelerators having local memory for the plurality of accelerators shared remotely with the host computer, the method comprising:

assigning to each accelerator a rank in a logical tree and a specific function to perform, the results of the specific function to be stored in the accelerator'"'"'s dedicated portion;

performing, by the plurality of accelerators, a local reduction operation with the local shared memory for the accelerators, wherein performing the local reduction operation with the location shared memory for the accelerators comprises;

performing, by each accelerator, the accelerator'"'"'s assigned specific function including;

determining, by the accelerator in dependence upon the accelerator'"'"'s assigned rank, whether the accelerator is authorized to perform the accelerator'"'"'s assigned specific function; and

if the accelerator is authorized to perform the accelerator'"'"'s assigned specific function, performing, by the accelerator, the specific function and incrementing, by the accelerator, a counter; and

storing locally by each accelerator the results of the accelerator'"'"'s assigned specific function in the accelerator'"'"'s dedicated portion of the local shared memory for the accelerators;

writing remotely, by one of the plurality of accelerators to the shared memory local to the host computer, a result of the local reduction operation; and

reading, by the host computer from shared memory local to the host computer, the result of the local reduction operation.

View all claims

1 Assignment

Timeline View

Assignment View

0 Petitions

Accused Products

Abstract

Data processing in a hybrid computing environment that includes a host computer, a plurality of accelerators, the host computer and the accelerators adapted to one another for data communications by a system level message passing module, the host computer having local memory shared remotely with the accelerators, the accelerators having local memory for the plurality of accelerators shared remotely with the host computer, where data processing according to embodiments of the present invention includes performing, by the plurality of accelerators, a local reduction operation with the local shared memory for the accelerators; writing remotely, by one of the plurality of accelerators to the shared memory local to the host computer, a result of the local reduction operation; and reading, by the host computer from shared memory local to the host computer, the result of the local reduction operation.

Citations

12 Claims

1. A method of data processing in a hybrid computing environment, the hybrid computing environment comprising a host computer having a host computer architecture, a plurality of accelerators having an accelerator architecture, wherein each of the plurality of accelerators has a dedicated portion of the local shared memory for the accelerators, the accelerator architecture optimized, with respect to the host computer architecture, for speed of execution of a particular class of computing functions, the host computer and the accelerators adapted to one another for data communications by a system level message passing module, the host computer having local memory shared remotely with the accelerators, the accelerators having local memory for the plurality of accelerators shared remotely with the host computer, the method comprising:
- assigning to each accelerator a rank in a logical tree and a specific function to perform, the results of the specific function to be stored in the accelerator'"'"'s dedicated portion;
  
  performing, by the plurality of accelerators, a local reduction operation with the local shared memory for the accelerators, wherein performing the local reduction operation with the location shared memory for the accelerators comprises;
  
  performing, by each accelerator, the accelerator'"'"'s assigned specific function including;
  
  determining, by the accelerator in dependence upon the accelerator'"'"'s assigned rank, whether the accelerator is authorized to perform the accelerator'"'"'s assigned specific function; and
  
  if the accelerator is authorized to perform the accelerator'"'"'s assigned specific function, performing, by the accelerator, the specific function and incrementing, by the accelerator, a counter; and
  
  storing locally by each accelerator the results of the accelerator'"'"'s assigned specific function in the accelerator'"'"'s dedicated portion of the local shared memory for the accelerators;
  
  writing remotely, by one of the plurality of accelerators to the shared memory local to the host computer, a result of the local reduction operation; and
  
  reading, by the host computer from shared memory local to the host computer, the result of the local reduction operation.
- View Dependent Claims (2, 3, 4)
- - 2. The method of claim 1 wherein assigning each accelerator a specific function to perform further comprises instructing an accelerator having no children in the logical tree to store the accelerator'"'"'s contribution data in the accelerator'"'"'s dedicated portion of the local shared memory for the accelerators.
  - 3. The method of claim 1 wherein assigning each accelerator a specific function to perform further comprises instructing an accelerator having one or more children in the logical tree to read the contents of the children'"'"'s dedicated portions of the local shared memory for the accelerators and to perform the specific function with the read contents and contribution data of the accelerator having one or more children.
  - 4. The method of claim 1 wherein determining whether the accelerator is authorized to perform the accelerator'"'"'s assigned specific function further comprises determining whether the counter exceeds a predetermined threshold for a level of depth of the logic tree to which the accelerator is assigned.

5. A hybrid computing environment for data processing, the hybrid computing environment comprising a host computer having a host computer architecture, a plurality of accelerators having an accelerator architecture, wherein each of the plurality of accelerators has a dedicated portion of the local shared memory for the accelerators, the accelerator architecture optimized, with respect to the host computer architecture, for speed of execution of a particular class of computing functions, the host computer and the accelerators adapted to one another for data communications by a system level message passing module, the host computer having local memory shared remotely with the accelerators, the accelerators having local memory for the plurality of accelerators shared remotely with the host computer, the plurality of accelerators comprising computer program instructions capable of:
- assigning to each accelerator a rank in a logical tree and a specific function to perform, the results of the specific function to be stored in the accelerator'"'"'s dedicated portion;
  
  performing, by the plurality of accelerators, a local reduction operation with the local shared memory for the accelerators, wherein performing the local reduction operation with the location shared memory for the accelerators comprises;
  
  performing, by each accelerator, the accelerator'"'"'s assigned specific function including;
  
  determining, by the accelerator in dependence upon the accelerator'"'"'s assigned rank, whether the accelerator is authorized to perform the accelerator'"'"'s assigned specific function; and
  
  if the accelerator is authorized to perform the accelerator'"'"'s assigned specific function, performing, by the accelerator, the specific function and incrementing, by the accelerator, a counter; and
  
  storing locally by each accelerator the results of the accelerator'"'"'s assigned specific function in the accelerator'"'"'s dedicated portion of the local shared memory for the accelerators;
  
  writing remotely, by one of the plurality of accelerators to the shared memory local to the host computer, a result of the local reduction operation; and
  
  the host computer comprising computer program instructions capable of reading, by the host computer from shared memory local to the host computer, the result of the local reduction operation.
- View Dependent Claims (6, 7, 8)
- - 6. The hybrid computing environment of claim 5 wherein assigning each accelerator a specific function to perform further comprises instructing an accelerator having no children in the logical tree to store the accelerator'"'"'s contribution data in the accelerator'"'"'s dedicated portion of the local shared memory for the accelerators.
  - 7. The hybrid computing environment of claim 5 wherein assigning each accelerator a specific function to perform further comprises instructing an accelerator having one or more children in the logical tree to read the contents of the children'"'"'s dedicated portions of the local shared memory for the accelerators and to perform the specific function with the read contents and contribution data of the accelerator having one or more children.
  - 8. The hybrid computing environment of claim 5 wherein determining whether the accelerator is authorized to perform the accelerator'"'"'s assigned specific function further comprises determining whether the counter exceeds a predetermined threshold for a level of depth of the logic tree to which the accelerator is assigned.

9. A computer program product for data processing in a hybrid computing environment, the hybrid computing environment comprising a host computer having a host computer architecture, a plurality of accelerators having an accelerator architecture, wherein each of the plurality of accelerators has a dedicated portion of the local shared memory for the accelerators, the accelerator architecture optimized, with respect to the host computer architecture, for speed of execution of a particular class of computing functions, the host computer and the accelerators adapted to one another for data communications by a system level message passing module, the host computer having local memory shared remotely with the accelerators, the accelerators having local memory for the plurality of accelerators shared remotely with the host computer, the computer program product disposed in a computer readable, recordable storage medium, the computer program product comprising computer program instructions capable of:
- assigning to each accelerator a rank in a logical tree and a specific function to perform, the results of the specific function to be stored in the accelerator'"'"'s dedicated portion;
  
  performing, by the plurality of accelerators, a local reduction operation with the local shared memory for the accelerators, wherein performing the local reduction operation with the location shared memory for the accelerators comprises;
  
  performing, by each accelerator, the accelerator'"'"'s assigned specific function including;
  
  determining, by the accelerator in dependence upon the accelerator'"'"'s assigned rank, whether the accelerator is authorized to perform the accelerator'"'"'s assigned specific function; and
  
  if the accelerator is authorized to perform the accelerator'"'"'s assigned specific function, performing, by the accelerator, the specific function and incrementing, by the accelerator, a counter; and
  
  storing locally by each accelerator the results of the accelerator'"'"'s assigned specific function in the accelerator'"'"'s dedicated portion of the local shared memory for the accelerators;
  
  writing remotely, by one of the plurality of accelerators to the shared memory local to the host computer, a result of the local reduction operation; and
  
  reading, by the host computer from shared memory local to the host computer, the result of the local reduction operation.
- View Dependent Claims (10, 11, 12)
- - 10. The computer program product of claim 9 wherein assigning each accelerator a specific function to perform further comprises instructing an accelerator having no children in the logical tree to store the accelerator'"'"'s contribution data in the accelerator'"'"'s dedicated portion of the local shared memory for the accelerators.
  - 11. The computer program product of claim 9 wherein assigning each accelerator a specific function to perform further comprises instructing an accelerator having one or more children in the logical tree to read the contents of the children'"'"'s dedicated portions of the local shared memory for the accelerators and to perform the specific function with the read contents and contribution data of the accelerator having one or more children.
  - 12. The computer program product of claim 9 wherein determining whether the accelerator is authorized to perform the accelerator'"'"'s assigned specific function further comprises determining whether the counter exceeds a predetermined threshold for a level of depth of the logic tree to which the accelerator is assigned.

Specification

Resources

Litigation Campaign Assessment

Current Assignee
International Business Machines Corporation
Original Assignee
International Business Machines Corporation
Inventors
Archer, Charles J., Carey, James E., Markland, Matthew W., Sanders, Philip J., Schimke, Timothy J.
Primary Examiner(s)
LIN, WEN TAI

Application Number

US12/362,137
Publication Number

US 20100191823A1
Time in Patent Office

2,462 Days
Field of Search
US Class Current

1/1
CPC Class Codes

G06F 2209/5017   Task decomposition

G06F 2209/509   Offload

G06F 9/5055   considering software capabi...

G06F 9/5066   Algorithms for mapping a pl...

G06F 9/544   Buffers; Shared memory; Pipes

Data processing in a hybrid computing environment

First Claim

1 Assignment

0 Petitions

Accused Products

Abstract

Citations

12 Claims

Specification

Solutions

Use Cases

Quick Links

Data processing in a hybrid computing environment

First Claim

1 Assignment

Subscription Required

Subscription Required

0 Petitions

Subscription Required

Accused Products

Subscription Required

Abstract

Citations

12 Claims

Specification

Subscription Required

Solutions

Use Cases

Quick Links