General purpose distributed data parallel computing using a high level language
First Claim
Patent Images
1. A system, comprising:
- a processor readable storage hardware device;
a user interface; and
a processor coupled to the processor readable hardware storage device and to the user interface, wherein the processor readable hardware storage device includes instructions that cause the processor to;
execute a sequential application program comprising a data parallel portion that includes an expression, wherein the application is written in a high-level language and comprises both imperative operations and declarative operations;
access the expression from a portion of the sequential application program that comprises a declarative operation;
based on the expression, automatically generate an execution plan graph, the execution plan graph including a directed graph having vertices that represent processes and edges between the vertices that represent data channels, the execution plan graph for executing the expression in parallel at nodes of a compute cluster, including causing the processor to break the expression into a plurality of sub-expressions, each of the sub-expressions is a vertex in the directed graph;
automatically generate vertex code for the vertices of the execution plan graph;
automatically generate serialization code that allows data to be passed in the data channels between the vertices;
provide the execution plan graph, the serialization code, and the vertex code to an execution engine of the compute cluster that manages parallel execution of the expression in the compute cluster based on the execution plan graph, the serialization code, and the vertex code;
receive results of executing the execution plan graph in the compute cluster; and
execute a portion of the sequential application program that comprises an imperative operation to present the results in the user interface.
2 Assignments
0 Petitions
Accused Products
Abstract
General-purpose distributed data-parallel computing using a high-level language is disclosed. Data parallel portions of a sequential program that is written by a developer in a high-level language are automatically translated into a distributed execution plan. The distributed execution plan is then executed on large compute clusters. Thus, the developer is allowed to write the program using familiar programming constructs in the high level language. Moreover, developers without experience with distributed compute systems are able to take advantage of such systems.
26 Citations
20 Claims
-
1. A system, comprising:
-
a processor readable storage hardware device; a user interface; and a processor coupled to the processor readable hardware storage device and to the user interface, wherein the processor readable hardware storage device includes instructions that cause the processor to; execute a sequential application program comprising a data parallel portion that includes an expression, wherein the application is written in a high-level language and comprises both imperative operations and declarative operations; access the expression from a portion of the sequential application program that comprises a declarative operation; based on the expression, automatically generate an execution plan graph, the execution plan graph including a directed graph having vertices that represent processes and edges between the vertices that represent data channels, the execution plan graph for executing the expression in parallel at nodes of a compute cluster, including causing the processor to break the expression into a plurality of sub-expressions, each of the sub-expressions is a vertex in the directed graph; automatically generate vertex code for the vertices of the execution plan graph; automatically generate serialization code that allows data to be passed in the data channels between the vertices; provide the execution plan graph, the serialization code, and the vertex code to an execution engine of the compute cluster that manages parallel execution of the expression in the compute cluster based on the execution plan graph, the serialization code, and the vertex code; receive results of executing the execution plan graph in the compute cluster; and execute a portion of the sequential application program that comprises an imperative operation to present the results in the user interface. - View Dependent Claims (2, 3, 4, 5, 6)
-
-
7. A machine implemented method comprising:
-
executing, by a processor on a client machine, a sequential application program comprising a data parallel portion that includes an expression, wherein the application is written in a high-level language and comprises both imperative operations and declarative operations; accessing, by the processor, an expression from a portion of the sequential application program that comprises a declarative operation; based on the expression, the processor automatically generating an execution plan graph, the execution plan graph including a directed graph having vertices that represent processes and edges between the vertices that represent data channels, the execution plan graph for executing the expression in parallel at nodes of a compute cluster, including the processor breaking the expression into a plurality of sub-expressions, wherein each of the sub-expressions is a vertex in the directed graph; automatically generating vertex code for the vertices of the execution plan graph by the processor; automatically generating serialization code that allows data to be passed in the data channels between the vertices by the processor; providing, from the client machine to an execution engine of the compute cluster, the execution plan graph, the serialization code, and the vertex code, wherein the execution engine manages parallel execution of the expression in the compute cluster based on the execution plan graph, the serialization code, and the vertex code; receiving, by the client machine, results of executing the execution plan graph in the compute cluster; and executing, by the processor, a portion of the sequential application program that comprises an imperative operation to present the results in a user interface associated with the client machine. - View Dependent Claims (8, 9, 10, 11, 12, 13, 14, 15, 16)
-
-
17. A computer readable storage hardware device having stored thereon computer executable instructions which, when executed on a processor, cause the processor to:
-
execute a sequential application program comprising a data parallel portion that includes an expression, wherein the application is written in a high-level language and comprises both imperative operations and declarative operations; access the expression from a portion of the sequential application program that comprises a declarative operation; based on the expression, automatically generate an execution plan graph, the execution plan graph including a directed graph having vertices that represent processes and edges between the vertices that represent data channels, the execution plan graph for executing the expression in parallel at nodes of a compute cluster, including causing the processor to break the expression into a plurality of sub-expressions, each of the sub-expressions is a vertex in the directed graph; automatically generate vertex code for the vertices of the execution plan graph; automatically generate serialization code that allows data to be passed in the data channels between the vertices; provide the execution plan graph, the serialization code, and the vertex code to an execution engine of the compute cluster that manages parallel execution of the expression in the compute cluster based on the execution plan graph, the serialization code, and the vertex code; receive results of executing the execution plan graph in the compute cluster; and execute a portion of the sequential application program that comprises an imperative operation to present the results in a user interface. - View Dependent Claims (18, 19, 20)
-
Specification