M X N dispatching in large scale distributed system
First Claim
Patent Images
1. A method, comprising:
- receiving a query;
generating, by a master node, a query plan to perform the query, wherein the generating of the query plan includes dividing the query plan into at least a first portion and a second portion, and wherein the master node comprises one or more hardware processors;
selecting, by the master node, from a set of available query processing segments a first subset of query processing segments to perform a first assigned portion of the query plan corresponding to the first portion of the query plan, and a second subset of query processing segments to perform a second assigned portion of the query plan corresponding to the second portion of the query plan, wherein at least one segment of the set of available query processing segments is included in the first subset and the second subset of query processing segments, wherein a first number of segments selected to perform the first portion of the query plan is dynamically determined according to one or both of (1) data locality of data corresponding to the first portion of the query plan associated with the query in relation to the first subset of query processing segments, and (2) available resources, wherein the first subset of query processing segments is selected based at least in part on a co-locality of one or more of the selected query processing segments with data with which the assigned portion of the query plan is associated, and wherein the first number of segments is selected to perform the first portion of the query plan and a second number of segments, different from the first number, is selected to perform the second portion of the query plan; and
dispatching to the selected first subset of query processing segments an assignment to perform the first assigned portion of the query plan, wherein the dispatching of the assignment to perform the first assigned portion of the query plan includes providing to the selected first subset of query processing segments with corresponding metadata that is obtained from a central metadata store, wherein the metadata provided to the corresponding selected first subset of query processing segments is determined to be used to perform the first assigned portion of the query plan.
9 Assignments
0 Petitions
Accused Products
Abstract
M×N dispatching in a large scale distributed system is disclosed. In various embodiments, a query is received. A query plan is generated to perform the query. A subset of query processing segments is selected, from a set of available query processing segments, to perform an assigned portion of the query plan. An assignment to perform the assigned portion of the query plan is dispatched to the selected subset of query processing segments.
-
Citations
17 Claims
-
1. A method, comprising:
-
receiving a query; generating, by a master node, a query plan to perform the query, wherein the generating of the query plan includes dividing the query plan into at least a first portion and a second portion, and wherein the master node comprises one or more hardware processors; selecting, by the master node, from a set of available query processing segments a first subset of query processing segments to perform a first assigned portion of the query plan corresponding to the first portion of the query plan, and a second subset of query processing segments to perform a second assigned portion of the query plan corresponding to the second portion of the query plan, wherein at least one segment of the set of available query processing segments is included in the first subset and the second subset of query processing segments, wherein a first number of segments selected to perform the first portion of the query plan is dynamically determined according to one or both of (1) data locality of data corresponding to the first portion of the query plan associated with the query in relation to the first subset of query processing segments, and (2) available resources, wherein the first subset of query processing segments is selected based at least in part on a co-locality of one or more of the selected query processing segments with data with which the assigned portion of the query plan is associated, and wherein the first number of segments is selected to perform the first portion of the query plan and a second number of segments, different from the first number, is selected to perform the second portion of the query plan; and dispatching to the selected first subset of query processing segments an assignment to perform the first assigned portion of the query plan, wherein the dispatching of the assignment to perform the first assigned portion of the query plan includes providing to the selected first subset of query processing segments with corresponding metadata that is obtained from a central metadata store, wherein the metadata provided to the corresponding selected first subset of query processing segments is determined to be used to perform the first assigned portion of the query plan. - View Dependent Claims (2, 3, 4, 5, 6, 7, 8, 9, 17)
-
-
10. A system, comprising:
-
a communication interface; and one or more hardware processors coupled to the communication interface and configured to; receive a query; generate a query plan to perform the query, wherein the query plan is generated such that the query plan is divided into at least a first portion and a second portion; select from a set of available query processing segments a first subset of query processing segments to perform a first assigned portion of the query plan, corresponding to the first portion of the query plan, and a second subset of query processing segments to perform a second assigned portion of the query plan corresponding to the second portion of the query plan, wherein at least one segment of the set of available query processing segments is included in the first subset and the second subset of query processing segments, wherein a first number of segments selected to perform the first portion of the query plan is dynamically determined according to one or both of (1) data locality of data corresponding to the first portion of the query plan associated with the query in relation to the first subset of query processing segments, and (2) available resources, wherein the first subset of query processing segments is selected based at least in part on a co-locality of one or more of the selected query processing segments with data with which the assigned portion of the query plan is associated, and wherein the first number of segments is selected to perform the first portion of the query plan and a second number of segments, different from the first number, is selected to perform the second portion of the query plan; and dispatch to the selected first subset of query processing segments, via the communication interface, an assignment to perform the first assigned portion of the query plan, wherein to dispatch the assignment to perform the first assigned portion of the query plan includes providing to the selected first subset of query processing segments with corresponding metadata that is obtained from a central metadata store, wherein the metadata provided to the corresponding selected first subset of query processing segments is determined to be used to perform the first assigned portion of the query plan. - View Dependent Claims (11, 12, 13, 14)
-
-
15. A computer program product embodied in a tangible, non-transitory computer readable storage means, comprising computer instructions for:
-
receiving a query; generating a query plan to perform the query, wherein the generating of the query plan includes dividing the query plan into at least a first portion and a second portion; selecting from a set of available query processing segments a first subset of query processing segments to perform a first assigned portion of the query plan corresponding to the first portion of the query plan, and a second subset of query processing segments to perform a second assigned portion of the query plan corresponding to the second portion of the query plan, wherein at least one segment of the set of available query processing segments is included in the first subset and the second subset of query processing segments, wherein a first number of segments selected to perform the first portion of the query plan is dynamically determined according to one or both of (1) data locality of data corresponding to the first portion of the query plan associated with the query in relation to the first subset of query processing segments, and (2) available resources, wherein the first subset of query processing segments is selected based at least in part on a co-locality of one or more of the selected query processing segments with data with which the assigned portion of the query plan is associated, and wherein the first number of segments is selected to perform the first portion of the query plan and a second number of segments, different from the first number, is selected to perform the second portion of the query plan; and dispatching to the selected first subset of query processing segments an assignment to perform the first assigned portion of the query plan, wherein the dispatching of the assignment to perform the first assigned portion of the query plan includes providing to the selected first subset of query processing segments with corresponding metadata that is obtained from a central metadata store, wherein the metadata provided to the corresponding selected first subset of query processing segments is determined to be used to perform the first assigned portion of the query plan. - View Dependent Claims (16)
-
Specification