Run-time optimizations of queries with SQL spreadsheet
First Claim
1. A method for processing a query, the method comprising the steps of:
- examining said query;
wherein said query evaluates to a relation;
wherein the query defines an array and a set of formulas that reference the array;
wherein the array has one or more dimensions that correspond to columns of the relation;
wherein the query includes a clause specifying at least one partition column of said relation by which to partition the relation into partitions;
assigning a plurality of partitions to processing elements for processing, wherein the relation is partitioned into said plurality of partitions based on a set of one or more partition columns that include said at least one partition column; and
processing said query based on the assignment of said plurality of partitions.
1 Assignment
0 Petitions
Accused Products
Abstract
Described herein are optimizations and execution strategies for spreadsheet extensions to SQL. The partitioning of data, as specified in a spreadsheet clause, provides a way to parallelize the computation of spreadsheet and to provide and improve scalability. Even if the partitioning is not explicitly specified in the spreadsheet clause, the database optimizer can automatically infer the partitioning in some cases. Efficient hash based access structures on relations can be used for symbolic array addressing, enabling fast computation of formulas. When rewriting SQL statements, formulas whose results are not referenced in outer blocks can be removed from the spreadsheet clause, thus removing unnecessary computations. The predicates from other query blocks can be moved inside query blocks with spreadsheets clauses, thus considerably reducing the amount of data to be processed. Conditions for validity of this transformation are given.
88 Citations
25 Claims
-
1. A method for processing a query, the method comprising the steps of:
-
examining said query;
wherein said query evaluates to a relation;
wherein the query defines an array and a set of formulas that reference the array;
wherein the array has one or more dimensions that correspond to columns of the relation;
wherein the query includes a clause specifying at least one partition column of said relation by which to partition the relation into partitions;
assigning a plurality of partitions to processing elements for processing, wherein the relation is partitioned into said plurality of partitions based on a set of one or more partition columns that include said at least one partition column; and
processing said query based on the assignment of said plurality of partitions. - View Dependent Claims (2, 3, 4, 5, 6, 7)
-
-
8. A method for determining an order for evaluating a set of formulas defined by a query, the method comprising the steps of:
-
examining said query;
wherein said query evaluates to a relation;
wherein the query defines an array and the set of formulas, wherein the set of formulas reference the array;
wherein the array has one or more dimensions that correspond to columns of the relation;
grouping the set of formulas into levels, wherein each level includes one or more formulas of said set of formulas, wherein the levels satisfy a first set of one or more criteria, wherein said first set of one or more criteria include;
evaluation of any formula in each level of said levels does not depend;
on another formula in the same level, and on another formula in a level higher than said each level;
evaluating the formulas in each of the levels, wherein;
the one or more formulas in the level are evaluated before another formula in a higher level is evaluated; and
if said each level includes one or more formulas that specify an aggregate function, computing each aggregate function specified by the one or more formulas before evaluating a formula in said each level. - View Dependent Claims (9, 10, 11)
-
-
12. A method for evaluating a set of formulas defined by a query, the method comprising the steps of:
-
examining said query;
wherein said query evaluates to a relation;
wherein the query defines an array and the set of formulas;
wherein the set of formulas reference the array;
wherein the array has one or more dimensions that correspond to columns of the relation;
establishing a subset of said set of formulas as not being acyclic;
grouping the set of formulas into levels, wherein each level includes one or more formulas of said set of formulas, wherein the step of grouping includes assigning each formula of said subset to consecutive levels;
wherein the levels satisfy a first set of one or more criteria, wherein said first set of one or more criteria include;
if a particular level of said levels belongs to the consecutive levels, evaluation of the formula in the particular level does not depend on a formula in a higher level that does not belong to the consecutive levels;
if the particular level of said levels does not belong to the consecutive levels, then;
evaluation of any formula in the particular level of said levels does not depend;
on another formula in the particular level, and on another formula in a level higher than said particular level; and
evaluating the set of formulas in an order based on said levels, wherein the step of evaluating includes determining whether the formulas in the subset converge. - View Dependent Claims (13, 14, 15, 16, 17)
-
-
18. A method for accessing data needed to process a query, the method comprising the steps of:
-
examining said query;
wherein said query evaluates to a relation with rows;
wherein the query defines an array with cells and a set of formulas that reference the array;
wherein the array has one or more dimensions that correspond to a column of the relation as a dimension column;
wherein each cell of said cells corresponds to a row in said relation;
wherein the query includes a clause specifying at least one partition column of said relation by which to partition the relation into partitions that each contain rows within said relation;
accessing data for each partition in a multi-level hash access structure having a first level and a second level and a plurality of tables;
wherein the first level correlates each partition of said partitions to a particular table of said plurality of tables based on values in said at least one partition column, said particular table containing records holding data for the cells of said array the correspond to said each partition; and
wherein the second level correlates records within the particular table for a partition to values in the one or more dimension columns. - View Dependent Claims (19, 20, 21, 22, 23, 24, 25)
-
Specification