Allocating resources for parallel execution of query plans
First Claim
1. A method of allocating computing resources in a grid to an executable query plan, wherein the executable query plan is dividable into a plurality of query sub-plans at a plurality of query sub-plan levels in the executable query plan, the method comprising:
- repeatedly splitting the query plan into query plan segments and provisionally allocating computing resources in the grid to the query plan segments comprising query-sub plans until the query plan segments have no more than one query sub-plan level, thereby forming a provisional allocation;
computing sub-plan workload densities for respective query sub-plan levels of a query plan segment comprising the query plan;
wherein the splitting the query plan into query plan segments comprises;
splitting the query plan segment comprising the query plan at a point in the query plan segment comprising the query plan where a summation of level workload densities of different consecutive query sub-plan levels of the query sub-plan levels exceeds half of a total workload density for the query plan segment comprising the query plan;
wherein the computing resources in the grid comprise a first computing resource and a second computing resource and the query plan segments comprise a first query plan segment and a second query plan segment;
wherein provisionally allocating computing resources in the grid to the query plan segments comprises;
finding a shortest path between a source and a sink in a graph, the shortest path in the graph comprising a first edge representing a cost of processing the first query plan segment at the first computing resource and a second edge representing a cost of processing the second query plan segment at the second computing resource; and
distributing the computing resources in the grid among the query sub-plans in respective of the query plan segments according to the provisional allocation.
2 Assignments
0 Petitions
Accused Products
Abstract
Computing resources can be assigned to sub-plans within a query plan to effect parallel execution of the query plan. For example, computing resources in a grid can be represented by nodes, and a shortest path technique can be applied to allocate machines to the sub-plans. Computing resources can be provisionally allocated as the query plan is divided into query plan segments containing one or more sub-plans. Based on provisional allocations to the segments, the computing resources can then be allocated to the sub-plans within respective segments. Multiprocessor computing resources can be supported. The techniques can account for data locality. Both pipelined and partitioned parallelism can be addressed. Described techniques can be particularly suited for efficient execution of bushy query plans in a grid environment. Parallel processing will reduce the overall response time of the query.
5 Citations
19 Claims
-
1. A method of allocating computing resources in a grid to an executable query plan, wherein the executable query plan is dividable into a plurality of query sub-plans at a plurality of query sub-plan levels in the executable query plan, the method comprising:
-
repeatedly splitting the query plan into query plan segments and provisionally allocating computing resources in the grid to the query plan segments comprising query-sub plans until the query plan segments have no more than one query sub-plan level, thereby forming a provisional allocation; computing sub-plan workload densities for respective query sub-plan levels of a query plan segment comprising the query plan; wherein the splitting the query plan into query plan segments comprises; splitting the query plan segment comprising the query plan at a point in the query plan segment comprising the query plan where a summation of level workload densities of different consecutive query sub-plan levels of the query sub-plan levels exceeds half of a total workload density for the query plan segment comprising the query plan; wherein the computing resources in the grid comprise a first computing resource and a second computing resource and the query plan segments comprise a first query plan segment and a second query plan segment; wherein provisionally allocating computing resources in the grid to the query plan segments comprises; finding a shortest path between a source and a sink in a graph, the shortest path in the graph comprising a first edge representing a cost of processing the first query plan segment at the first computing resource and a second edge representing a cost of processing the second query plan segment at the second computing resource; and distributing the computing resources in the grid among the query sub-plans in respective of the query plan segments according to the provisional allocation. - View Dependent Claims (2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12)
-
-
13. A computer-implemented method of allocating resources for executing a query according to a query plan, the method comprising:
-
receiving a representation of the query plan, wherein the query plan is dividable into a plurality of query sub-plans; dividing the query plan into a plurality of segments of the query plan having respective one or more query sub-plans of the query plan, the plurality of segments comprising a first segment and a second segment, wherein the dividing the query plan into the plurality of segments comprises; computing sub-plan workload densities for respective levels of a third segment; and dividing the third segment at a point in the third segment where a summation of level workload densities of different consecutive levels of the levels of the third segment exceeds half of a total workload density for the third segment; creating a graph representation of connections between the plurality of segments of the query plan, wherein the graph representation comprises a plurality of nodes representing respective computing resources of a plurality of computing resources and a plurality of connecting edges representing respective resource costs, wherein the plurality of connecting edges comprise a first connecting edge and a second connecting edge and wherein the plurality of computing resources comprise a first computing resource and a second computing resource; finding a shortest path between the first segment and the second segment in the graph representation, the shortest path comprises the first connecting edge representing a cost of processing the first segment at the first computing resource and the second connecting edge representing a cost of processing the second segment at the second computing resource; responsive to finding the shortest path, allocating computing resources associated with the shortest path to respective segments of the query plan; and allocating the computing resources to query sub-plans of the query plan according to allocations to the segments of the query plan, thereby forming an allocation of the computing resources to the query sub-plans; and outputting the allocation of the computing resources to the query sub-plans.
-
-
14. A computer-implemented method of indicating an allocation of computing resources to a query plan comprising a plurality of query sub-plans, the method comprising:
-
receiving a representation of the query plan; splitting the query plan into segments of the query plan, the segments of the query plan comprising a first query plan segment, a second query plan segment, and a third query plan segment, wherein the splitting the query plan into the segments of the query plan comprises; computing sub-plan workload densities for respective levels of the third query plan segment; and splitting the third query plan segment at a point in the third query plan segment where a summation of level workload densities of different consecutive levels of the levels of the third query plan segment exceeds half of a total workload density for the third query plan segment; constructing a graph representing connections between a plurality of the segments of the query plan comprising query sub-plans and the computing resources, wherein the plurality of the segments of the query plan comprise the first query plan segment and a the second query plan segment and the computing resources comprise a first computing resource and a second computing resource; performing a shortest path search on the graph, wherein the performing the shortest path search on the graph comprises; finding a shortest path between a source and a sink in the graph, the shortest path in the graph comprising a first edge representing a cost of processing the first query plan segment at the first computing resource and a second edge representing a cost of processing the second query plan segment at the second computing resource; based on results of the shortest path search, determining an allocation of computing resources to respective query sub-plans, removing allocated computing resources from the graph, and repeating the shortest path search; and outputting the allocation of computing resources to the respective query sub-plans. - View Dependent Claims (15)
-
-
16. One or more computer-readable storage media comprising computer-executable instructions causing a computer to perform a method comprising:
-
receiving a query plan; initially considering the query plan as a single query segment; repeatedly splitting query segments of the query plan having more than one level into child query segments until the child query segments have one level; after splitting a query segment, allocating machines to the child query segments of the query plan via repeatedly finding a shortest path in a graph representing costs of processing the segments at various machines in a grid and costs of communicating between machines in the grid, wherein nodes representing the allocated machines are removed from the graph after allocation; wherein finding the shortest path in the graph comprises finding a shortest path between a source representing a first child query segment in the graph and a sink representing a second child query segment in the graph, the shortest path in the graph comprising a first edge representing a cost of processing the first child query segment at a first computing resource, a second edge representing a cost of processing the second child query segment at the second computing resource, and a third edge representing a cost of communicating between the first computing resource and the second computing resource; when the query segments have been reduced to one level, distributing machines allocated to respective segments among query sub-plans within the respective segments based at least on consideration of a total time for respective of the machines to process respective of the query sub-plans, wherein the distributing comprises repeatedly performing (a)-(e); (a) constructing a table storing process cost values for respective permutations of query sub-plans and computing resources; (b) finding a minimum process cost value in the table; (c) allocating a computing resource associated with the minimum process cost value to a query sub-plan associated with the minimum process cost value; (d) removing the computing resource and the query sub-plan associated with the minimum process cost value from the table; and (e) determining whether allocating further resources to the query sub-plan associated with the minimum process cost value is profitable, and responsive to determining that allocating further computing resources to the query sub-plan is not profitable, ceasing to allocate further computing resources to the query sub-plan.
-
-
17. A system configured to allocate machines in a grid to a query plan, the system comprising:
-
at least one processor; memory; a stored query plan comprising a plurality of query sub-plans and associated stored estimates of an amount of data to be processed by respective sub-plans of the query plan; a machine allocator comprising a graph representation of query segments of the query plan and the machines, wherein the query plan is split into the query segments comprising the plurality of query sub-plans, wherein the splitting of the query plan comprises; computing sub-plan workload densities for respective levels of a query segment with more than one level; and splitting the query segment with more than one level at a point in the query segment with more than one level where a summation of level workload densities of different consecutive levels of the query plan segment with more than one level exceeds half of a total workload density for the query plan segment with more than one level; wherein the graph representation represents a cost of processing a first query segment at respective of the machines and a cost of communicating between the machines, wherein the machine allocator finds a shortest path between a source and a sink in the graph representation and allocates machines in the shortest path to respective query segments of the query plan, wherein the shortest path between the source and the sink in the graph representation comprises a first edge representing a cost of processing the first query segment at a first machine of the machines and a second edge representing a cost of processing a second query segment at a second machine of the machines; and a stored allocation of computing resources to respective sub-plans of the query plan. - View Dependent Claims (18, 19)
-
Specification