General Distributed Reduction For Data Parallel Computing

US 20100241828A1
Filed: 03/18/2009
Published: 09/23/2010
Est. Priority Date: 03/18/2009
Status: Active Grant

First Claim

Patent Images

1. A machine implemented method for distributed parallel processing, comprising:

receiving an expression from a sequential program that is executing at a first machine;

automatically generating an execution plan including a map phase and a reduction phase for executing the expression in parallel at nodes of a compute cluster; and

providing the execution plan to an execution engine that controls parallel execution of the expression in the compute cluster.

View all claims

2 Assignments

Timeline View

Assignment View

0 Petitions

Accused Products

Abstract

General-purpose distributed data-parallel computing using high-level computing languages is described. Data parallel portions of a sequential program written in a high-level language are automatically translated into a distributed execution plan. Map and reduction computations are automatically added to the plan. Patterns in the sequential program can be automatically identified to trigger map and reduction processing. Direct invocation of map and reduction processing is also provided. One or more portions of the reduce computation are pushed to the map stage and dynamic aggregation is inserted when possible. The system automatically identifies opportunities for partial reductions and aggregation, but also provides a set of extensions in a high-level computing language for the generation and optimization of the distributed execution plan. The extensions include annotations to declare functions suitable for these optimizations.

166 Citations

20 Claims

1. A machine implemented method for distributed parallel processing, comprising:
- receiving an expression from a sequential program that is executing at a first machine;
  
  automatically generating an execution plan including a map phase and a reduction phase for executing the expression in parallel at nodes of a compute cluster; and
  
  providing the execution plan to an execution engine that controls parallel execution of the expression in the compute cluster.
- View Dependent Claims (2, 3, 4, 5, 6, 7, 8, 9)
- - 2. A machine implemented method according to claim 1, wherein:
    - the expression includes related select and group functions;
      
      the automatically generating an execution plan includes determining that the related select and group functions are executable using map and reduction processing and generating the execution plan to include the map phase and the reduction phase in response to the determining.
  - 3. A machine implemented method according to claim 2, wherein determining that the related select and group functions are executable using map and reduction processing includes:
    - analyzing the expression from the sequential program for one or more select and group patterns indicative of suitable map and reduction processing; and
      
      generating the map phase and reduction phase in response to a positive identification of the one or more patterns.
  - 4. A machine implemented method according to claim 2, wherein:
    - the expression invokes the related select and group functions using a map and reduction operator.
  - 5. A machine implemented method according to claim 1, wherein the reduce phase includes a reduce function, the method further comprising:
    - optimizing the execution plan prior to providing the execution plan to the execution engine, the optimizing includes determining that the reduce function is combiner-enabled and in response to the determining, moving at least a portion of the reduce function from the reduce phase to the map phase.
  - 6. A machine implemented method according to claim 5, wherein:
    - the at least a portion of the reduce function is a grouping function;
      
      the map phase includes a data partitioning operation; and
      
      the moving at least a portion of the reduce function includes moving the grouping function to before the data partitioning operation to effect a partial data reduction prior to the data partitioning.
  - 7. A machine implemented method according to claim 5, wherein determining that the reduce function is combiner-enabled includes determining that the reduce function is either homomorphic or decomposable.
  - 8. A machine implemented method according to claim 7, wherein optimizing the execution plan includes:
    - if the reduce function is determined to be homomorphic, adding a dynamic aggregation phase to the execution plan.
  - 9. A machine implemented method according to claim 7, wherein:
    - determining that the reduce function is homomorphic or decomposable includes determining that the reduce function includes a homomorphic annotation or a decomposable annotation.

10. One or more processor readable storage devices having processor readable code stored thereon, the processor readable code programs one or more processors to perform a method comprising:
- accessing an expression from a sequential program that is executing at a first machine, the expression including related select and groupby functions;
  
  automatically generating an execution plan for parallel processing of the expression by a distributed execution engine, the execution plan including a map phase corresponding to the select function and a reduce phase corresponding to the groupby function, the map phase specifying a partition of an input dataset for distribution to a plurality of nodes in the distributed execution engine;
  
  optimizing the execution plan by decomposing the groupby function and pushing at least a portion of the decomposed groupby function from the reduce phase to the map phase, the at least a portion of the decomposed groupby function including a partial reduction to reduce a size of the input dataset prior to the partition; and
  
  providing the execution plan to the distributed execution engine for controlling parallel execution of the expression.
- View Dependent Claims (11, 12, 13, 14)
- - 11. One or more processor readable storage devices according to claim 10, wherein automatically generating the execution plan includes:
    - analyzing the expression from the sequential program for one or more patterns indicative of suitable map and reduction processing; and
      
      generating the map phase and reduce phase in response to a positive identification of the one or more patterns.
  - 12. One or more processor readable storage devices according to claim 10, wherein automatically generating an execution plan includes:
    - determining whether the reduce phase is homomorphic-combiner-enabled by determining whether one or more functions of the reduce phase are homomorphic functions; and
      
      if the reduce phase is homomorphic-combiner-enabled, automatically adding a dynamic aggregation between the map phase and reduce stage to aggregate data from the map phase before performing a reduction at the reduce phase.
  - 13. One or more processor readable storage devices according to claim 12, wherein:
    - the dynamic aggregation aggregates partitioned results of the map phase from a single machine.
  - 14. One or more processor readable storage devices according to 10, wherein:
    - the distributed execution engine includes a compute cluster having a plurality of nodes, the distributed execution engine controlling parallel execution of the expression in the compute cluster;
      
      automatically generating the execution plan includes automatically generating a execution plan graph having vertices, automatically generating code based on the at least one extension and assigning the code to particular vertices in the execution plan graph, the automatically generating the execution plan being performed by a distributed execution provider; and
      
      the method further comprises distributing the code to nodes in the compute cluster corresponding to the particular vertices in the execution plan graph.

15. A distributed parallel processing system, comprising:
- a compute cluster including a plurality of nodes, each node including at least one processor;
  
  an execution provider that accesses expressions from a sequential program that is running at a first machine, the execution provider determines that at least one expression from the sequential program can be expressed using map and reduction computations, the execution provider automatically generates an execution plan graph and code for vertices in the execution plan graph for parallel processing of the at least one expression, the execution provider generates at least one map phase and at least one reduce phase for the execution plan graph, the code for vertices in the execution plan graph implement the map and reduction computations; and
  
  an execution engine that receives the execution plan graph and the code from the execution provider and manages parallel execution of the expression in the compute cluster based on the execution plan graph and the code.
- View Dependent Claims (16, 17, 18, 19, 20)
- - 16. A distributed parallel processing system according to claim 15, wherein:
    - the reduce phase includes a plurality of functions;
      
      the execution provider optimizes the execution plan graph prior to providing the execution plan to the execution engine;
      
      the execution provider determines whether the reduce phase is decomposable based on the plurality of functions; and
      
      if the execution provider determines that the reduce phase is decomposable, the execution provider optimizes the execution plan graph by shifting one or more of the reduce phase functions to the map phase.
  - 17. A distributed parallel processing according to claim 16, wherein:
    - the execution provider determines whether the reduce phase is homomorphic based on the plurality of functions; and
      
      if the execution provider determines that the reduce phase is homomorphic, the execution provider optimizes the execution plan graph by inserting a dynamic aggregator between at least a portion of the map phase and at least a portion of the reduce phase.
  - 18. A distributed parallel processing system according to claim 16, wherein:
    - the execution provider determines whether the reduce phase is decomposable by determining whether the plurality of functions include decomposable annotations.
  - 19. A distributed parallel processing system according to claim 16, wherein:
    - the execution provider analyzes the at least one expression from the sequential program for one or more patterns indicative of suitable map and reduction processing;
      
      the execution provider generates the at least one map phase and the at least one reduce for the execution plan graph in response to positively identifying the one or more patterns.
  - 20. A distributed parallel processing system according to claim 15, wherein:
    - the execution engine distributes the code for vertices in the execution plan graph to nodes in the compute cluster corresponding to the vertices in the execution plan graph.

Specification

Resources

Litigation Campaign Assessment

Current Assignee
Microsoft Technology Licensing LLC (Microsoft Corporation)
Original Assignee
Microsoft Corporation
Inventors
Gunda, Pradeep Kumar, Yu, Yuan, Isard, Michael A.

Granted Patent

US 8,239,847 B2
Time in Patent Office

Days
Field of Search
US Class Current

712/30
CPC Class Codes

G06F 8/456   Parallelism detection

G06F 9/5066   Algorithms for mapping a pl...

H04L 12/44   Star or tree networks

General Distributed Reduction For Data Parallel Computing

First Claim

2 Assignments

0 Petitions

Accused Products

Abstract

166 Citations

20 Claims

Specification

Solutions

Use Cases

Quick Links

General Distributed Reduction For Data Parallel Computing

First Claim

2 Assignments

Subscription Required

Subscription Required

0 Petitions

Subscription Required

Accused Products

Subscription Required

Abstract

166 Citations

20 Claims

Specification

Subscription Required

Solutions

Use Cases

Quick Links