Runtime optimization of distributed execution graph
First Claim
1. A method for performing distributed parallel processing, comprising:
- assigning a first set of code corresponding to vertices of a graph to various nodes of a distributed parallel processing engine based on said graph, said graph defines a parallel processing job, vertices of said graph represent code and edges represent data channels;
automatically modifying said graph during runtime based on executing said first set of code, wherein said automatically modifying said graph comprises determining where in a network said first set of code executed, grouping together at least a first subset of said first set of code that executed in a common section of said network and adding additional code to process data from said first subset, said additional code corresponds to a new vertex on said graph, said additional code is assigned to run in said common section, said additional code is a copy of code for an existing vertex; and
assigning a second set of code corresponding to vertices of said graph to various nodes of said distributed parallel processing engine based on said modified graph, said first set of code and said second set of code are part of said parallel processing job.
2 Assignments
0 Petitions
Accused Products
Abstract
A general purpose high-performance distributed execution engine for coarse-grained data-parallel applications is proposed that allows developers to easily create large-scale distributed applications without requiring them to master concurrency techniques beyond being able to draw a graph of the data-dependencies of their algorithms. Based on the graph, a job manager intelligently distributes the work load so that the resources of the execution engine are used efficiently. During runtime, the job manager (or other entity) can automatically modify the graph to improve efficiency. The modifications are based on runtime information, topology of the distributed execution engine, and/or the distributed application represented by the graph.
-
Citations
16 Claims
-
1. A method for performing distributed parallel processing, comprising:
-
assigning a first set of code corresponding to vertices of a graph to various nodes of a distributed parallel processing engine based on said graph, said graph defines a parallel processing job, vertices of said graph represent code and edges represent data channels; automatically modifying said graph during runtime based on executing said first set of code, wherein said automatically modifying said graph comprises determining where in a network said first set of code executed, grouping together at least a first subset of said first set of code that executed in a common section of said network and adding additional code to process data from said first subset, said additional code corresponds to a new vertex on said graph, said additional code is assigned to run in said common section, said additional code is a copy of code for an existing vertex; and assigning a second set of code corresponding to vertices of said graph to various nodes of said distributed parallel processing engine based on said modified graph, said first set of code and said second set of code are part of said parallel processing job. - View Dependent Claims (2, 3, 4, 5, 6, 7)
-
-
8. One or more processor readable storage devices having processor readable code stored thereon, said processor readable code programs one or more processors to perform a method comprising:
-
accessing a representation of a directed acyclic graph defining a parallel processing job, said graph having vertices and edges connecting to said vertices, said vertices represents sets of program code of said parallel processing job to be run on a distributed execution engine and edges represent data channels; automatically modifying said graph during runtime of said parallel processing job on said distributed execution engine based on runtime information of said parallel processing job on said distributed execution engine, wherein said automatically modifying comprises determining where a first set of vertices were executed in said distributed execution engine, grouping together vertices of said first set based on where they ran in said distributed execution engine, adding new vertices to each group and adding new edges for said new vertices, said ne vertices are copies of existing vertices; and assigning code corresponding to vertices of said graph to various nodes of said distributed parallel processing engine based on said graph. - View Dependent Claims (9, 10, 11, 12, 13)
-
-
14. A distributed parallel processing system, comprising:
-
a network; a plurality of computing machines connected to said network; and a job manager connected to said network and in communication with said computing machines, said job manager manages execution of a job defined by a user customizable graph, said user customizable graph includes a set of vertices corresponding to a set of program units and edges corresponding to data channels, said job manager assigns said program units for execution on said computing machines based on said graph; said job manager automatically modifies said graph during runtime of said job based on topology of said network, availability of said computing machines and said vertices; said job manager automatically modifies said graph by determining where a first set of vertices were executed in said network, grouping together vertices of said first set based on where they ran in said network;
adding new vertices to the graph, and adding new edges for said new vertices;the new vertices correspond to code from existing vertices. - View Dependent Claims (15, 16)
-
Specification