Distributed parallel determination of single and multiple source shortest paths in large directed graphs
First Claim
1. A method of updating nodes in a nearest seed computation, comprising:
- maintaining a nearest seed distance table in random access memory on a first server for first nodes, the first nodes being a portion of the nodes of a directed graph representation of application data;
maintaining a link table in mass storage memory on the first server,the link table and the distance table each being ordered identically by node identifier, andthe link table having, for each of the first nodes, information identifying all of the outgoing edges in the directed graph from the first node, each outgoing edge connecting the first node to a respective target node of the first node;
identifying dirty nodes in the distance table, dirty nodes being nodes having dirty nearest seed information for seeds recorded as being within a threshold distance of the nodes;
looking up the dirty nodes in the link table and obtaining from the link table next node information about outgoing edges and target nodes for the dirty nodes; and
propagating updates of nearest seed information to one or more second servers that own the target nodes identified in the next node information, the target nodes being other nodes of the directed graph.
2 Assignments
0 Petitions
Accused Products
Abstract
Systems and methods for checkpointing a computation distributed over multiple peer servers. On each server, sequentially storing checkpoints collectively representing a current state of the computation on that server as of a most recent checkpoint, each checkpoint having a checkpoint timestamp. When restarting a first server, rebuilding a most recent state of the first server from the checkpoints written by the first server through a most recent checkpoint having a most recent checkpoint timestamp, and requesting from each of the other peer servers updates from the most recent checkpoint timestamp time of the first server. On each server, in response to a first request for updates as of a particular time, deriving the requested updates from the state data in the server uncommitted to a checkpoint and the state data in checkpoints of the server that have a timestamp no earlier than the particular time of the first request, and providing the requested updates to the first server.
-
Citations
19 Claims
-
1. A method of updating nodes in a nearest seed computation, comprising:
-
maintaining a nearest seed distance table in random access memory on a first server for first nodes, the first nodes being a portion of the nodes of a directed graph representation of application data; maintaining a link table in mass storage memory on the first server, the link table and the distance table each being ordered identically by node identifier, and the link table having, for each of the first nodes, information identifying all of the outgoing edges in the directed graph from the first node, each outgoing edge connecting the first node to a respective target node of the first node; identifying dirty nodes in the distance table, dirty nodes being nodes having dirty nearest seed information for seeds recorded as being within a threshold distance of the nodes; looking up the dirty nodes in the link table and obtaining from the link table next node information about outgoing edges and target nodes for the dirty nodes; and propagating updates of nearest seed information to one or more second servers that own the target nodes identified in the next node information, the target nodes being other nodes of the directed graph. - View Dependent Claims (2, 3, 4, 5)
-
-
6. A system for performing nearest seed computations, comprising:
-
multiple servers, each server having a link table stored on mass storage, a distance table stored in random access memory, the link table and the distance table each storing information about nodes assigned to the server, the information being stored in an order sorted by node identifier; the link tables of the multiple servers collectively containing a complete representation of a directed graph of nodes and edges, the representation of each of the nodes being assigned to exactly one of the multiple servers; each of the servers performing in parallel a portion of a distributed nearest seed computation, wherein each of the servers performing a portion of a distributed nearest seed computation comprises; identifying dirty nodes in the distance table of the server, dirty nodes being nodes having dirty nearest seed information for seeds recorded as being within a threshold distance of the nodes; looking up the dirty nodes in the link table of the server and obtaining from the link table next node information about outgoing edges and target nodes for the dirty nodes; and propagating updates of nearest seed information to one or more other servers that own the target nodes identified in the next node information, the target nodes being other nodes of the directed graph. - View Dependent Claims (7, 8, 9)
-
-
10. A method, comprising:
-
maintaining, in mass storage memory of a server, a graph extract file storing node-edge data describing a portion of a directed graph, the portion comprising a plurality of nodes and the outgoing edges from the nodes in the plurality of nodes, the graph extract file being sorted by identifiers of the plurality of nodes and containing information identifying for each outgoing edge a target node to which the edge is directed; maintaining, in a random access memory of the server, a distance table having a record for each of the plurality of nodes, the records having the same sort order as the sort order of the graph extract file, each record containing nearest seed information for a corresponding node; scanning, by the server, the distance table in the sort order to identify dirty nodes, dirty nodes being nodes having dirty nearest seed information for seeds within a threshold distance of the nodes; and looking up the dirty nodes in the graph extract file as the distance table is being scanned, thereby reading portions from the graph extract file in a look ahead order from beginning to end. - View Dependent Claims (11, 12, 13, 14)
-
-
15. A system comprising:
-
a server configured to perform operations comprising; maintaining a nearest seed distance table for first nodes in random access memory, the first nodes being a portion of the nodes of a directed graph representation of application data; maintaining a link table in mass storage memory, the link table and the distance table each being ordered identically by node identifier, and the link table having, for each of the first nodes, information identifying all of the outgoing edges in the directed graph from the first node, each outgoing edge connecting the first node to a respective target node of the first node; identifying dirty nodes in the distance table, dirty nodes being nodes having dirty nearest seed information for seeds recorded as being within a threshold distance of the nodes; looking up the dirty nodes in the link table and obtaining from the link table next node information about outgoing edges and target nodes for the dirty nodes; and propagating updates of nearest seed information to one or more second servers that own the target nodes identified in the next node information, the target nodes being other nodes of the directed graph. - View Dependent Claims (16, 17, 18, 19)
-
Specification