System, method, and computer program product for progressive query processing

US 7,716,215 B2
Filed: 11/14/2007
Issued: 05/11/2010
Est. Priority Date: 10/31/2003
Status: Expired due to Fees

First Claim

Patent Images

1. A computer-implemented method for accelerating database query processing, the method comprising:

placing a number of checkpoints in a particular query execution plan generated for a database query, the particular query execution plan being based on estimated optimization parameter values derived from historic database statistics,wherein a first subset of the number of checkpoints are placed at a first set of points in the particular query execution plan, the first set of points comprising points where an entire intermediate result is materialized before proceeding with further operators in the particular query execution plan, andwherein an explicit materialization is added to the particular query execution plan just before a second subset of the number of checkpoints to cause materialization of an intermediate result, the second subset of the number of checkpoints comprising checkpoints placed on the outer side of nested-loop joins;

executing the particular query execution plan, wherein the executing selectively retains temporarily materialized views by storing partial query results until a lazy removal condition occurs, the lazy removal conditions comprising (a) updating of at least one table contributing to a materialized view, and (b) determining that new storage space is needed for other materialized views;

computing, during the executing the particular execution plan upon encountering a given checkpoint within the number of checkpoints, a difference between the estimated optimization parameter values and actual optimization parameter values observed during the executing the particular query execution plan to determine a significance of parameter estimation errors, wherein the actual optimization parameter values comprise cardinality, memory, communication costs, and I/O operations;

determining, at the given checkpoint and based on the computing, that an amount of query execution time for the particular query execution plan is above a remaining execution time threshold and that the significance of parameter estimation errors is above an error threshold;

suspending, in response to determining that the amount of query execution time is above the remaining execution time threshold and that the significance of parameter estimation errors is above an error threshold, execution of the particular query execution plan;

re-optimizing, in response to determining that the amount of query execution time is above the remaining execution time threshold and that the significance of parameter estimation errors is above an error threshold, the particular query, wherein the re-optimizing comprises;

generating a number of alternative query execution plans, including query execution plans using the intermediate result and query execution plans that do not use the intermediate result;

assigning a respective execution cost to each alternative query execution plan, the respective execution cost being based upon database execution statistics determined during the executing; and

choosing, based on the respective execution cost of each alternative query execution plan, an optimal alternative query execution plan as a re-optimized query plan;

restarting query execution with the re-optimized query plan, wherein execution of the re-optimized query plan selectively reuses the intermediate results;

buffering rows produced by the executing until the checkpoint is evaluated so as to enable pipelining with some delay, wherein exhaustion of temporary space used for the buffering triggers the re-optimization;

transferring, prior to the restarting, each result row produced by the executing the particular query execution plan to a parent operator in a pipelined manner, the transferring comprising storing respective identifiers for the each result row into a side table using an insert query execution operator just below a return query plan operator;

executing, after the restarting, an anti join between the side table and a new result stream to compensate for the rows returned, wherein executing the anti-join comprises identifying records in the side table by a respective unique derived record identifier assigned to respective records within the side table during the executing the particular query execution plan;

selectively returning records at each execution cycle when the records have not previously been returned; and

outputting query results.

View all claims

1 Assignment

Timeline View

Assignment View

0 Petitions

Accused Products

Abstract

A method, system, and computer program product to make query processing more robust in the face of optimization errors. The invention validates the statistics and assumptions used for compiling a query as the query is executed and, when necessary, progressively re-optimizes the query in mid-execution based on the knowledge learned during its partial execution. The invention selectively places a number of CHECK operators in a query execution plan to validate the optimizer'"'"'s cardinality estimates against actual cardinalities. Errors beyond a threshold trigger re-optimization, and the optimizer decides whether the old plan is still optimal and whether to re-use previously computed results. The invention addresses arbitrary SQL queries whose plans can contain sub-queries, updates, trigger checking, and view maintenance operations. The invention can handle concurrent update transactions or updates of common sub-expressions in a query execution plan without compromising consistency and isolation as locking information is tied to the record ID.

92 Citations

View as Search Results

1 Claim

1. A computer-implemented method for accelerating database query processing, the method comprising:
- placing a number of checkpoints in a particular query execution plan generated for a database query, the particular query execution plan being based on estimated optimization parameter values derived from historic database statistics,wherein a first subset of the number of checkpoints are placed at a first set of points in the particular query execution plan, the first set of points comprising points where an entire intermediate result is materialized before proceeding with further operators in the particular query execution plan, andwherein an explicit materialization is added to the particular query execution plan just before a second subset of the number of checkpoints to cause materialization of an intermediate result, the second subset of the number of checkpoints comprising checkpoints placed on the outer side of nested-loop joins;
  
  executing the particular query execution plan, wherein the executing selectively retains temporarily materialized views by storing partial query results until a lazy removal condition occurs, the lazy removal conditions comprising (a) updating of at least one table contributing to a materialized view, and (b) determining that new storage space is needed for other materialized views;
  
  computing, during the executing the particular execution plan upon encountering a given checkpoint within the number of checkpoints, a difference between the estimated optimization parameter values and actual optimization parameter values observed during the executing the particular query execution plan to determine a significance of parameter estimation errors, wherein the actual optimization parameter values comprise cardinality, memory, communication costs, and I/O operations;
  
  determining, at the given checkpoint and based on the computing, that an amount of query execution time for the particular query execution plan is above a remaining execution time threshold and that the significance of parameter estimation errors is above an error threshold;
  
  suspending, in response to determining that the amount of query execution time is above the remaining execution time threshold and that the significance of parameter estimation errors is above an error threshold, execution of the particular query execution plan;
  
  re-optimizing, in response to determining that the amount of query execution time is above the remaining execution time threshold and that the significance of parameter estimation errors is above an error threshold, the particular query, wherein the re-optimizing comprises;
  
  generating a number of alternative query execution plans, including query execution plans using the intermediate result and query execution plans that do not use the intermediate result;
  
  assigning a respective execution cost to each alternative query execution plan, the respective execution cost being based upon database execution statistics determined during the executing; and
  
  choosing, based on the respective execution cost of each alternative query execution plan, an optimal alternative query execution plan as a re-optimized query plan;
  
  restarting query execution with the re-optimized query plan, wherein execution of the re-optimized query plan selectively reuses the intermediate results;
  
  buffering rows produced by the executing until the checkpoint is evaluated so as to enable pipelining with some delay, wherein exhaustion of temporary space used for the buffering triggers the re-optimization;
  
  transferring, prior to the restarting, each result row produced by the executing the particular query execution plan to a parent operator in a pipelined manner, the transferring comprising storing respective identifiers for the each result row into a side table using an insert query execution operator just below a return query plan operator;
  
  executing, after the restarting, an anti join between the side table and a new result stream to compensate for the rows returned, wherein executing the anti-join comprises identifying records in the side table by a respective unique derived record identifier assigned to respective records within the side table during the executing the particular query execution plan;
  
  selectively returning records at each execution cycle when the records have not previously been returned; and
  
  outputting query results.

Specification

Resources

Litigation Campaign Assessment

Current Assignee
Hulu LLC (The Walt Disney Company)
Original Assignee
International Business Machines Corporation
Inventors
Volker, Marki, Raman, Vijayshankar, Lohman, Guy Maring, Pirahesh, Mir Hamid, Simmen, David Everett
Primary Examiner(s)
Rones; Charles
Assistant Examiner(s)
Gebresenbet; Dinku W

Application Number

US11/940,185
Publication Number

US 20080177722A1
Time in Patent Office

909 Days
Field of Search

None
US Class Current

707/721
CPC Class Codes

G06F 16/90335   Query processing

Y10S 707/99931   Database or file accessing

Y10S 707/99932   Access augmentation or opti...

Y10S 707/99933   Query processing, i.e. sear...

Y10S 707/99934   Query formulation, input pr...

Y10S 707/99935   Query augmenting and refini...

System, method, and computer program product for progressive query processing

First Claim

1 Assignment

0 Petitions

Accused Products

Abstract

92 Citations

1 Claim

Specification

Use Cases

Quick Links

Others

System, method, and computer program product for progressive query processing

First Claim

1 Assignment

Subscription Required

Subscription Required

0 Petitions

Subscription Required

Accused Products

Subscription Required

Abstract

92 Citations

1 Claim

Specification

Subscription Required

Use Cases

Quick Links

Others