RUNTIME OPTIMIZATION OF DISTRIBUTED EXECUTION GRAPH
First Claim
1. A method for performing distributed parallel processing, comprising:
- assigning a first set of code corresponding to vertices of a graph to various nodes of a distributed parallel processing engine based on said graph, said graph defines a parallel processing job;
automatically modifying said graph based on executing said first set of code; and
assigning a second set of code corresponding to vertices of said graph to various nodes of said distributed parallel processing engine based on said modified graph, said first set of code and said second set of code are part of said parallel processing job.
2 Assignments
0 Petitions
Accused Products
Abstract
A general purpose high-performance distributed execution engine for coarse-grained data-parallel applications is proposed that allows developers to easily create large-scale distributed applications without requiring them to master concurrency techniques beyond being able to draw a graph of the data-dependencies of their algorithms. Based on the graph, a job manager intelligently distributes the work load so that the resources of the execution engine are used efficiently. During runtime, the job manager (or other entity) can automatically modify the graph to improve efficiency. The modifications are based on runtime information, topology of the distributed execution engine, and/or the distributed application represented by the graph.
142 Citations
20 Claims
-
1. A method for performing distributed parallel processing, comprising:
-
assigning a first set of code corresponding to vertices of a graph to various nodes of a distributed parallel processing engine based on said graph, said graph defines a parallel processing job; automatically modifying said graph based on executing said first set of code; and assigning a second set of code corresponding to vertices of said graph to various nodes of said distributed parallel processing engine based on said modified graph, said first set of code and said second set of code are part of said parallel processing job. - View Dependent Claims (2, 3, 4, 5, 7, 8, 9)
-
-
6. A method according to claim 6, wherein:
said additional code, said first set of code and said graph are provided by a developer.
-
10. One or more processor readable storage devices having processor readable code stored thereon, said processor readable code programs one or more processors to perform a method comprising:
-
accessing a representation of a directed acyclic graph defining a parallel processing job, said graph having vertices and edges connecting to said vertices, said vertices represents sets of program code of said parallel processing job to be run on a distributed execution engine; automatically modifying said graph during runtime of said parallel processing job on said distributed execution engine based on runtime information of said parallel processing job on said distributed execution engine; and assigning code corresponding to vertices of said graph to various nodes of said distributed parallel processing engine based on said graph. - View Dependent Claims (11, 12, 13, 14, 15, 16)
-
-
17. A distributed parallel processing system, comprising:
-
a network; a plurality of computing machines connected to said network; and a job manager connected to said network and in communication with said computing machines, said job manager manages execution of a job defined by a user customizable graph, said user customizable graph includes a set of vertices corresponding to a set of program units and edges corresponding to data channels, said job manager assigns said program units for execution on said computing machines based on said graph; said job manager automatically modifies said graph during runtime of said job based on topology of said network, availability of said computing machines and said vertices. - View Dependent Claims (18, 19, 20)
-
Specification