DISTRIBUTED PARALLEL COMPUTING
First Claim
Patent Images
1. A method for performing distributed parallel processing, comprising:
- finding one or more locations for each of a set of data files in a distributed data processing system having multiple nodes, said data files are for use by a set of program units;
identifying nodes of said distributed data processing system that are available for executing said program units and are near said data files; and
sending instructions to nodes near said data files to execute said program units.
2 Assignments
0 Petitions
Accused Products
Abstract
A general purpose high-performance distributed execution engine for coarse-grained data-parallel applications is proposed that allows developers to easily create large-scale distributed applications without requiring them to master concurrency techniques beyond being able to draw a graph of the data-dependencies of their algorithms. Based on the graph, a job manager intelligently distributes the work load so that system resources are used efficiently. The system is designed to scale from a small cluster of a few computers, or the multiple CPU cores on a powerful single computer, up to a data center containing thousands of servers.
-
Citations
20 Claims
-
1. A method for performing distributed parallel processing, comprising:
-
finding one or more locations for each of a set of data files in a distributed data processing system having multiple nodes, said data files are for use by a set of program units; identifying nodes of said distributed data processing system that are available for executing said program units and are near said data files; and sending instructions to nodes near said data files to execute said program units. - View Dependent Claims (2, 3, 4, 5, 6, 7, 8)
-
-
9. A distributed parallel processing system, comprising:
-
a job manager, said job manager manages execution of a system defined by a user customizable directed acyclic graph, said user customizable graph includes a set of vertices corresponding to a set of program units; and a plurality of computing machines in communication with said job manager, said plurality of computing machines include a first computing machine and a set of other machines, said first machine concurrently runs a first subset of multiple program units of said set of program units that correspond to multiple vertices of said graph, said set of other machines concurrently runs a second subset of multiple program units of said set of program units across different machines, said second subset of multiple program units correspond to multiple vertices of said graph. - View Dependent Claims (10, 11, 12, 13, 14, 15, 16)
-
-
17. One or more processor readable storage devices having processor readable code stored thereon, said processor readable code programs one or more processors to perform a method comprising:
-
evaluating a definition of a customizable directed acyclic graph; building said graph based on said definition; accessing first code that implements a vertex on said graph, said code implements a portion of a system that counts frequency of items in a network search data unit; identifying multiple nodes that store said network search data unit; determining that a particular node of said multiple nodes is best suited for executing said first code; and sending instructions to execute said first code to said particular node because said particular node was determined to be best suited for executing said code. - View Dependent Claims (18, 19, 20)
-
Specification