Parallel processing framework
First Claim
Patent Images
1. A computer-implemented method, comprising:
- receiving, by a computer system, one or more queries;
identifying a first set of nodes of a system cluster associated with a plurality of nodes to process a first portion of the one or more queries, the first set of nodes being a first subset of the plurality of nodes;
scheduling a plurality of jobs corresponding to the one or more queries;
causing individual nodes of the first set of nodes to process the first portion of the one or more queries in parallel and in accordance with the scheduled plurality of jobs; and
determining a second set of nodes of the system cluster associated with the plurality of nodes based at least in part on a retention policy associated with a plurality of distributed databases of the system cluster, the second set of nodes being a second subset of the plurality of nodes, individual nodes of the second set of nodes being a parent node to at least one child node of the first set of nodes in a hierarchical node structure, wherein at least one node is able to reside in the first set and the second set of nodes such that the at least one node may operate as a child node during processing of the first portion of the one or more queries and a parent node during processing of a second portion of the one or more queries, the first set of nodes and the second set of nodes comprising an instance on a database of the plurality of distributed databases, each layer of the hierarchical node structure being formed based at least in part on first results of the first portion of the one or more queries.
1 Assignment
0 Petitions
Accused Products
Abstract
Data can be processed in parallel across a cluster of nodes using a parallel processing framework. Using Web services calls between components allows the number of nodes to be scaled as necessary, and allows developers to build applications on the framework using a Web services interface. A job scheduler works together with a queuing service to distribute jobs to nodes as the nodes have capacity, such that jobs can be performed in parallel as quickly as the nodes are able to process the jobs. Data can be loaded efficiently across the cluster, and levels of nodes can be determined dynamically to process queries and other requests on the system.
-
Citations
18 Claims
-
1. A computer-implemented method, comprising:
-
receiving, by a computer system, one or more queries; identifying a first set of nodes of a system cluster associated with a plurality of nodes to process a first portion of the one or more queries, the first set of nodes being a first subset of the plurality of nodes; scheduling a plurality of jobs corresponding to the one or more queries; causing individual nodes of the first set of nodes to process the first portion of the one or more queries in parallel and in accordance with the scheduled plurality of jobs; and determining a second set of nodes of the system cluster associated with the plurality of nodes based at least in part on a retention policy associated with a plurality of distributed databases of the system cluster, the second set of nodes being a second subset of the plurality of nodes, individual nodes of the second set of nodes being a parent node to at least one child node of the first set of nodes in a hierarchical node structure, wherein at least one node is able to reside in the first set and the second set of nodes such that the at least one node may operate as a child node during processing of the first portion of the one or more queries and a parent node during processing of a second portion of the one or more queries, the first set of nodes and the second set of nodes comprising an instance on a database of the plurality of distributed databases, each layer of the hierarchical node structure being formed based at least in part on first results of the first portion of the one or more queries. - View Dependent Claims (2, 3, 4, 5, 6, 7, 8, 9)
-
-
10. A system, comprising:
-
a processor; and a memory device including instructions that, when executed with the processor, cause the system to, at least; receive one or more queries; identify a first set of nodes of a system cluster associated with a plurality of nodes, the first set of nodes being a first subset of the plurality of nodes; schedule a plurality of jobs corresponding to the one or more queries; cause individual nodes of the first set of nodes to process a first portion of the one or more queries in parallel and in accordance with the scheduled plurality of jobs; determine a second set of nodes of the system cluster associated with the plurality of nodes based at least in part on a retention policy associated with a plurality of distributed databases of the system cluster, the second set of nodes being a second subset of the plurality of nodes, individual nodes of the second set of nodes being a parent node to at least one child node of the first set of nodes in a hierarchical node structure, wherein at least one node is able to reside in the first set and the second set of nodes, such that the at least one node may operate as a child node during processing of the first portion of the one or more queries and a parent node during processing of a second portion of the one or more queries, the first set of nodes and the second set of nodes comprising an instance on a database of the plurality of distributed databases, each layer of the hierarchical node structure being formed based at least in part on first results from the first portion of the one or more queries; cause individual nodes of the second set of nodes to process the second portion of the one or more queries using the first results from the first portion of the one or more queries on individual respective child nodes; and store second results of the second portion of the one or more queries to a specified location. - View Dependent Claims (11, 12, 13, 14)
-
-
15. A computer program product embedded in a non-transitory computer-readable storage medium, comprising:
-
program code for receiving one or more queries; program code for identifying a first set of nodes of a system cluster associated with a plurality of nodes, the first set of nodes being a first subset of the plurality of nodes; program code for scheduling a plurality of jobs corresponding to the one or more queries; program code for causing individual nodes of the first set of nodes to process a first portion of the one or more queries in parallel and in accordance with the scheduled plurality of jobs; program code for determining a second set of nodes of the system cluster associated with the plurality of nodes based at least in part on a retention policy associated with a plurality of distributed databases of the system cluster, the second set of nodes being a second subset of the plurality of nodes, individual nodes of the second set of nodes being a parent to at least one child node of the first set of nodes in a hierarchical node structure, wherein at least one node is able to reside in the first set and the second set of nodes such that the at least one node may operate as a child node during processing of the first portion of the one or more queries and a parent node during processing of a second portion of the one or more queries, the first set of nodes and the second set of nodes comprising an instance on a database of the plurality of distributed databases, each layer of the hierarchical node structure being formed based at least in part on first results of the first portion of the one or more queries; program code for causing individual nodes of the second set of nodes to process the second portion of the one or more queries using the first results from the first portion of the one or more queries on individual respective child nodes; and program code for storing second results of the second portion of the one or more queries to a specified location. - View Dependent Claims (16, 17, 18)
-
Specification