Method and apparatus for performing query aware partitioning
First Claim
Patent Images
1. A method for processing a query, comprising:
- receiving, via a processor, a query plan comprising a plurality of queries;
classifying, via the processor, each one of the plurality of queries;
computing, via the processor, an optimal partition set for each one of the plurality of queries, wherein the optimal partition set maximizes an amount of data reduction that is performed locally by a node of the processor before transporting an intermediate result to a terminal node that produces a final result, wherein only leaf nodes are considered for a set of initial candidates;
reconciling, via the processor, the optimal partition set of each one of the plurality of queries with a subset of queries of the plurality of queries, wherein the reconciling is performed after the computing, wherein reconciling comprises;
testing the optimal partition set of each one of the plurality of queries against all other queries to ensure compatibility;
selecting an optimal partition set that is compatible with at least two queries of the plurality of queries and has a lowest cost based upon a lowest cost computation, wherein the lowest cost computation comprises a reconciled optimal partition set that provides a least amount of data transfer between a plurality of nodes, wherein a cost is defined as 0 when a query node of the query plan processes only local data, as an input rate of the query node when the query node is incompatible with the optimal partition set and as an output rate of the query node when the query node is compatible with the optimal partition set; and
using the optimal partition set for the at least two queries of the plurality of queries;
selecting, via the processor, the reconciled optimal partition set to be used by each query of the plurality of queries;
storing, via the processor, the reconciled optimal partition set in a computer readable medium;
applying, via the processor, the reconciled optimal partition set to the query plan to transform the query plan into an optimized query plan, wherein the applying the optimized query plan comprises;
assigning an operator to each node of a plurality of nodes that each node will execute, wherein at least two of the plurality of nodes perform different operators;
providing a parameter for each operator at each of the plurality of nodes; and
informing each node of the plurality of nodes a source and a destination of a data stream;
applying, via the processor, the optimized query plan to the data stream; and
outputting, via the processor, a result of the applying to a user.
1 Assignment
0 Petitions
Accused Products
Abstract
A method and system for providing query aware partitioning are disclosed. For example, the method receives a query plan comprising a plurality of queries, and classifies each one of the plurality of queries. The method computes an optimal partition set for each one of the plurality of queries, and reconciles the optimal partition set of each one of the plurality of queries with at least one subset of queries of the plurality of queries. The method selects at least one reconciled optimal partition set to be used by each query of the plurality of queries, and stores the selected at least one reconciled optimal partition set in a computer readable medium.
35 Citations
9 Claims
-
1. A method for processing a query, comprising:
-
receiving, via a processor, a query plan comprising a plurality of queries; classifying, via the processor, each one of the plurality of queries; computing, via the processor, an optimal partition set for each one of the plurality of queries, wherein the optimal partition set maximizes an amount of data reduction that is performed locally by a node of the processor before transporting an intermediate result to a terminal node that produces a final result, wherein only leaf nodes are considered for a set of initial candidates; reconciling, via the processor, the optimal partition set of each one of the plurality of queries with a subset of queries of the plurality of queries, wherein the reconciling is performed after the computing, wherein reconciling comprises; testing the optimal partition set of each one of the plurality of queries against all other queries to ensure compatibility; selecting an optimal partition set that is compatible with at least two queries of the plurality of queries and has a lowest cost based upon a lowest cost computation, wherein the lowest cost computation comprises a reconciled optimal partition set that provides a least amount of data transfer between a plurality of nodes, wherein a cost is defined as 0 when a query node of the query plan processes only local data, as an input rate of the query node when the query node is incompatible with the optimal partition set and as an output rate of the query node when the query node is compatible with the optimal partition set; and using the optimal partition set for the at least two queries of the plurality of queries; selecting, via the processor, the reconciled optimal partition set to be used by each query of the plurality of queries; storing, via the processor, the reconciled optimal partition set in a computer readable medium; applying, via the processor, the reconciled optimal partition set to the query plan to transform the query plan into an optimized query plan, wherein the applying the optimized query plan comprises; assigning an operator to each node of a plurality of nodes that each node will execute, wherein at least two of the plurality of nodes perform different operators; providing a parameter for each operator at each of the plurality of nodes; and informing each node of the plurality of nodes a source and a destination of a data stream; applying, via the processor, the optimized query plan to the data stream; and outputting, via the processor, a result of the applying to a user. - View Dependent Claims (2, 3, 4)
-
-
5. A non-transitory computer-readable medium storing a plurality of instructions, which when executed by a processor, cause the processor to perform operations for processing a query, the operations comprising:
-
receiving a query plan comprising a plurality of queries; classifying each one of the plurality of queries; computing an optimal partition set for each one of the plurality of queries, wherein the optimal partition set maximizes an amount of data reduction that is performed locally by a node of the processor before transporting an intermediate result to a terminal node that produces a final result, wherein only leaf nodes are considered for a set of initial candidates; reconciling the optimal partition set of each one of the plurality of queries with a subset of queries of the plurality of queries, wherein the reconciling is performed after the computing, wherein reconciling comprises; testing the optimal partition set of each one of the plurality of queries against all other queries to ensure compatibility; selecting an optimal partition set that is compatible with at least two queries of the plurality of queries and has a lowest cost based upon a lowest cost computation, wherein the lowest cost computation comprises a reconciled optimal partition set that provides a least amount of data transfer between a plurality of nodes, wherein a cost is defined as 0 when a query node of the query plan processes only local data, as an input rate of the query node when the query node is incompatible with the optimal partition set and as an output rate of the query node when the query node is compatible with the optimal partition set; and using the optimal partition set for the at least two queries of the plurality of queries; selecting the reconciled optimal partition set to be used by each query of the plurality of queries; storing the reconciled optimal partition set in a computer readable medium, applying the reconciled optimal partition set to the query plan to transform the query plan into an optimized query plan, wherein the applying the optimized query plan comprises; assigning an operator to each node of a plurality of nodes that each node will execute, wherein at least two of the plurality of nodes perform different operators; providing a parameter for each operator at each of the plurality of nodes; and informing each node of the plurality of nodes a source and a destination of a data stream; applying the optimized query plan to the data stream; and outputting a result of the applying to a user. - View Dependent Claims (6, 7, 8)
-
-
9. An apparatus for processing a query, comprising:
-
a hardware processor; and a computer-readable medium storing a plurality of instructions, which when executed by the hardware processor, cause the processor to perform operations, the operations comprising; receiving a query plan comprising a plurality of queries; classifying each one of the plurality of queries; computing an optimal partition set for each one of the plurality of queries, wherein the optimal partition set maximizes an amount of data reduction that is performed locally by a node of the hardware processor before transporting an intermediate result to a terminal node that produces a final result, wherein only leaf nodes are considered for a set of initial candidates; reconciling the optimal partition set of each one of the plurality of queries with a subset of queries of the plurality of queries, wherein the reconciling is performed after the computing, wherein reconciling comprises; testing the optimal partition set of each one of the plurality of queries against all other queries to ensure compatibility; selecting an optimal partition set that is compatible with at least two queries of the plurality of queries and has a lowest cost based upon a lowest cost computation, wherein the lowest cost computation comprises a reconciled optimal partition set that provides a least amount of data transfer between a plurality of nodes, wherein a cost is defined as 0 when a query node of the query plan processes only local data, as an input rate of the query node when the query node is incompatible with the optimal partition set and as an output rate of the query node when the query node is compatible with the optimal partition set; and using the optimal partition set for the at least two queries of the plurality of queries; selecting the reconciled optimal partition set to be used by each query of the plurality of queries; storing the reconciled optimal partition set in a computer readable medium; applying the reconciled optimal partition set to the query plan to transform the query plan into an optimized query plan, wherein the applying the optimized query plan comprises; assigning an operator to each node of a plurality of nodes that each node will execute, wherein at least two of the plurality of nodes perform different operators; providing a parameter for each operator at each of the plurality of nodes; and informing each node of the plurality of nodes a source and a destination of a data stream; applying the optimized query plan to the data stream; and outputting a result of the applying to a user.
-
Specification