High Level Programming Extensions For Distributed Data Parallel Processing
First Claim
1. A machine implemented method for distributed parallel processing, comprising:
- accessing an expression from a sequential program that is executing at a first machine, the expression invoking at least one extension for distributed parallel processing by a distributed execution engine;
automatically generating an execution plan for parallel processing of the expression by the distributed execution engine using the at least one extension; and
providing the execution plan to the distributed execution engine for controlling parallel execution of the expression.
2 Assignments
0 Petitions
Accused Products
Abstract
General-purpose distributed data-parallel computing using high-level computing languages is described. Data parallel portions of a sequential program that is written by a developer in a high-level language are automatically translated into a distributed execution plan. A set of extensions to a sequential high-level computing language are provided to support distributed parallel computations and to facilitate generation and optimization of distributed execution plans. The extensions are fully integrated with the programming language, thereby enabling developers to write sequential language programs using known constructs while providing the ability to invoke the extensions to enable better generation and optimization of the execution plan for a distributed computing environment.
-
Citations
20 Claims
-
1. A machine implemented method for distributed parallel processing, comprising:
-
accessing an expression from a sequential program that is executing at a first machine, the expression invoking at least one extension for distributed parallel processing by a distributed execution engine; automatically generating an execution plan for parallel processing of the expression by the distributed execution engine using the at least one extension; and providing the execution plan to the distributed execution engine for controlling parallel execution of the expression. - View Dependent Claims (2, 3, 4, 5, 6, 7, 8, 9, 10)
-
-
11. A distributed parallel processing system, comprising:
-
a compute cluster including a plurality of nodes, each node including at least one processor; an execution provider that accesses expressions from a sequential program that is running at a first machine, the execution provider receives at least one expression specifying an extension for distributed data parallel processing in the compute cluster, the execution provider automatically generates an execution plan graph and code for vertices in the execution plan graph using the extension for distributed data parallel processing; and an execution engine that receives the execution plan graph and the code from the execution provider and manages parallel execution of the expression in the compute cluster based on the execution plan graph and the code. - View Dependent Claims (12, 13, 14, 15)
-
-
16. One or more processor readable storage devices having processor readable code stored thereon, the processor readable code programs one or more processors to perform a method comprising:
-
receiving one or more expressions from an application executing at a first machine, the one or more expressions referencing a first dataset; automatically generating an execution plan for executing the one or more expressions in parallel at nodes of a compute cluster, the automatically generating including; determining whether the one or more expressions include an extension specifying a particular data partitioning for the first dataset, if the one or more expressions include the extension specifying a particular data partitioning, generating a first execution graph with the first dataset partitioned according to the particular data partitioning, and if the one or more expressions do not include the extension specifying a particular data partitioning, generating a second execution graph with the first dataset partitioned according to a different data partitioning; and providing the execution plan to an execution engine that controls parallel execution of the one or more expressions in the compute cluster. - View Dependent Claims (17, 18, 19, 20)
-
Specification