Data caching for distributed execution computing
First Claim
1. A method, comprising:
- storing a plurality of subgraphs of a directed acyclic graph (DAG) in a remote global data storage of a distributed storage system;
processing multiple subgraphs of the plurality of subgraphs in multiple computing devices of a DAG distributed execution engine, each of the multiple computing devices produces a vertex based on a corresponding subgraph and corresponding input values;
processing a first subgraph of the plurality of subgraphs from the remote global data storage with associated input values in a computing device of the DAG distributed execution engine in a first iteration by at least producing a vertex using the first subgraph and the associated input values, and aggregating the vertex with vertices produced by the multiple computing devices to generate first output values;
storing a second subgraph that is a copy of the first subgraph into a local cache of the computing device via a network that connects the local cache to the remote global data storage; and
processing the second subgraph with the first output values to generate a second output values in response to determining that the computing device is to process the first subgraph in a second iteration,wherein the DAG represents a sparse matrix of linkage information between web pages, each of the associated input values represents a rank value of a source web page to associated destination web pages, and each output value represents a rank value of the source web page to other source web pages.
2 Assignments
0 Petitions
Accused Products
Abstract
Embodiments for caching and accessing Directed Acyclic Graph (DAG) data to and from a computing device of a DAG distributed execution engine during the processing of an iterative algorithm. In accordance with one embodiment, a method includes processing a first subgraph of the plurality of subgraphs from the distributed storage system in the computing device. The first subgraph being processed with associated input values in the computing device to generate first output values in an iteration. The method further includes storing a second subgraph in a cache of the device. The second subgraph being a duplicate of the first subgraph. Moreover, the method also includes processing the second subgraph with the first output values to generate second output values if the device is to process the first subgraph in each of one or more subsequent iterations.
-
Citations
17 Claims
-
1. A method, comprising:
-
storing a plurality of subgraphs of a directed acyclic graph (DAG) in a remote global data storage of a distributed storage system; processing multiple subgraphs of the plurality of subgraphs in multiple computing devices of a DAG distributed execution engine, each of the multiple computing devices produces a vertex based on a corresponding subgraph and corresponding input values; processing a first subgraph of the plurality of subgraphs from the remote global data storage with associated input values in a computing device of the DAG distributed execution engine in a first iteration by at least producing a vertex using the first subgraph and the associated input values, and aggregating the vertex with vertices produced by the multiple computing devices to generate first output values; storing a second subgraph that is a copy of the first subgraph into a local cache of the computing device via a network that connects the local cache to the remote global data storage; and processing the second subgraph with the first output values to generate a second output values in response to determining that the computing device is to process the first subgraph in a second iteration, wherein the DAG represents a sparse matrix of linkage information between web pages, each of the associated input values represents a rank value of a source web page to associated destination web pages, and each output value represents a rank value of the source web page to other source web pages. - View Dependent Claims (2, 3, 4, 5, 6, 7, 8)
-
-
9. A computer readable memory having computer-executable instructions that, when executed, perform acts comprising:
-
storing a plurality of subgraphs of a directed acyclic graph (DAG) in a remote global data storage of a distributed storage system; processing multiple subgraphs of the plurality of subgraphs in multiple computing devices of a DAG distributed execution engine, each of the multiple computing devices producing a vertex based on a corresponding subgraph and corresponding input values; processing a first subgraph of the plurality of subgraphs from the remote global data storage with associated input values in a computing device of the DAG distributed execution engine in an iteration by at least producing a vertex using the first subgraph and the associated input values, and aggregating the vertex with vertices produced by the multiple computing devices to generate first output values; storing a second subgraph that is a copy of the first subgraph into a local cache of the computing device via a network that connects the local cache to the remote global data storage; and processing the second subgraph with the first output values to generate second output values in a subsequent iteration, wherein the DAG represents a sparse matrix of linkage information between web pages, each of the associated input values represents a rank value of a source web page to associated destination web pages, and each output value represents a rank value of the source web page to other source web pages. - View Dependent Claims (10, 11, 12, 13)
-
-
14. A Directed Acyclic Graph (DAG) distributed execution engine, the engine comprising:
-
a network; a plurality of computing devices connected by the network, each computing device including; one or more processors; and memory to store a plurality of computer-executable instructions for execution by the one or more processors, the computer-executable instructions comprising; storing a plurality of subgraphs of a DAG in a remote global data storage of a distributed storage system; processing multiple subgraphs of the plurality of subgraphs in multiple computing devices of the DAG distributed execution engine, wherein each of the multiple computing devices produces a vertex based on a corresponding subgraph and corresponding input values; processing a first subgraph of the plurality of subgraphs from the remote global data storage with associated input values in a computing device in a first iteration by at least producing a vertex using the first subgraph and the associated input values, and aggregating the vertex with vertices produced by the multiple computing devices to generate first output values; storing a second subgraph that is a copy of the first subgraph in a local cache of the computing device in response to determining that a number of times that the first subgraph has been processed by the computing device exceeds a predetermined threshold; and processing the second subgraph with the first output values to generate second output values in a second iteration that occurs after the first iteration, wherein the DAG represents a sparse matrix of linkage information between web pages, each of the associated input values represents a rank value of a source web page to associated destination web pages, and each output value represents a rank value of the source web page to other source web pages. - View Dependent Claims (15, 16, 17)
-
Specification