Optimizer statistics and cost model for in-memory tables
First Claim
Patent Images
1. A method comprising:
- receiving, at a database server, a query that requires performance of particular operations;
wherein the particular operations include operations that access a particular set of data items;
wherein copies of the data items of the particular set of data items reside in a particular row-major table on persistent storage;
wherein additional copies of at least a subset of the particular set of data items reside, in volatile memory, in one or more column-major in-memory compression-units;
evaluating a plurality of alternative execution plans for performance of the particular operations;
wherein the plurality of alternative execution plans includes at least;
a first execution plan that performs a full table scan that accesses copies of data items in the one or more column-major in-memory compression-units;
a second execution plan that does not perform a full table scan that accesses copies of data items in the one or more column-major in-memory compression-units;
for each execution plan of the plurality of alternative execution plans, determining a cost associated with the execution plan;
wherein determining the cost for the first execution plan includes at least one of;
determining what fraction of the particular row-major table is duplicated in the one or more column-major in-memory compression units;
determining how many column-major in-memory compression-units will need to be scanned, as part of the full table scan, after pruning non-matching column-major in-memory compression-units based on (a) filter predicates of the query, and (b) minimum and maximum values maintained for each column-major in-memory compression-unit;
determining a decompression cost for decompressing those column-major in-memory compression-units to be used during performance of the full table scan;
determining a row-stitching cost that represents overhead required to produce row-major results from data items extracted from column-major in-memory compression-units;
ordetermining a journal-scan cost for scanning a journal, maintained in volatile memory, that contains updates for data items that have become invalid in the one or more column-major in-memory compression-units;
selecting a particular execution plan, of the plurality of alternative execution plans, based on the cost determined for the particular execution plan;
obtaining a result set for the query by executing the particular execution plan; and
returning the result set of the query.
1 Assignment
0 Petitions
Accused Products
Abstract
Techniques are provided for determining costs for alternative execution plans for a query, where at least a portion of the data items required by the query are in in-memory compression-units within volatile memory. The techniques involve maintaining in-memory statistics, such as statistics that indicate what fraction of a table is currently present in in-memory compression units, and the cost of decompressing in-memory compression units. Those statistics are used to determine, for example, the cost of a table scan that retrieves some or all of the necessary data items from the in-memory compression-units.
-
Citations
24 Claims
-
1. A method comprising:
-
receiving, at a database server, a query that requires performance of particular operations; wherein the particular operations include operations that access a particular set of data items; wherein copies of the data items of the particular set of data items reside in a particular row-major table on persistent storage; wherein additional copies of at least a subset of the particular set of data items reside, in volatile memory, in one or more column-major in-memory compression-units; evaluating a plurality of alternative execution plans for performance of the particular operations; wherein the plurality of alternative execution plans includes at least; a first execution plan that performs a full table scan that accesses copies of data items in the one or more column-major in-memory compression-units; a second execution plan that does not perform a full table scan that accesses copies of data items in the one or more column-major in-memory compression-units; for each execution plan of the plurality of alternative execution plans, determining a cost associated with the execution plan; wherein determining the cost for the first execution plan includes at least one of; determining what fraction of the particular row-major table is duplicated in the one or more column-major in-memory compression units; determining how many column-major in-memory compression-units will need to be scanned, as part of the full table scan, after pruning non-matching column-major in-memory compression-units based on (a) filter predicates of the query, and (b) minimum and maximum values maintained for each column-major in-memory compression-unit; determining a decompression cost for decompressing those column-major in-memory compression-units to be used during performance of the full table scan; determining a row-stitching cost that represents overhead required to produce row-major results from data items extracted from column-major in-memory compression-units;
ordetermining a journal-scan cost for scanning a journal, maintained in volatile memory, that contains updates for data items that have become invalid in the one or more column-major in-memory compression-units; selecting a particular execution plan, of the plurality of alternative execution plans, based on the cost determined for the particular execution plan; obtaining a result set for the query by executing the particular execution plan; and returning the result set of the query. - View Dependent Claims (2, 3, 4, 5, 6, 7, 8, 9)
-
-
10. A method comprising:
-
receiving, at a database server, a query that requires performance of particular operations; wherein the particular operations include operations that access particular data items; wherein copies of the particular data items reside in a particular row-major table on persistent storage; wherein additional copies of at least a subset of the particular data items reside, in volatile memory, in one or more column-major in-memory compression-units; splitting the query into; a first query that accesses portions of the particular table that are not duplicated in the one or more column-major in-memory compression units, and a second query that accesses portions of the particular table that are duplicated in the one or more column-major in-memory compression units; evaluating a first plurality of alternative execution plans for the first query; wherein evaluating the first plurality of alternative execution plans includes determining costs for each of the first plurality of alternative execution plans; evaluating a second plurality of alternative execution plans for the second query; wherein evaluating the second plurality of alternative execution plans includes determining costs for each of the second plurality of alternative execution plans; selecting a first particular execution plan, of the first plurality of alternative execution plans, based on the cost determined for the first particular execution plan; selecting a second particular execution plan, of the second plurality of alternative execution plans, based on the cost determined for the second particular execution plan; obtaining a result set for the query by; executing the first particular execution plan to produce a first result set, executing the second particular execution plan to produce a second result set, and combining the first result set and the second result set into a combined result set for the query; and returning the combined result set for the query. - View Dependent Claims (11, 12)
-
-
13. One or more non-transitory computer-readable media storing instructions which, when executed by one or more processors, cause selection of execution plans for queries submitted to a dual-format database system by:
-
receiving, at a database server, a query that requires performance of particular operations; wherein the particular operations include operations that access a particular set of data items; wherein copies of the data items of the particular set of data items reside in a particular row-major table on persistent storage; wherein additional copies of at least a subset of the particular set of data items reside, in volatile memory, in one or more column-major in-memory compression-units; evaluating a plurality of alternative execution plans for performance of the particular operations; wherein the plurality of alternative execution plans includes at least; a first execution plan that performs a full table scan that accesses copies of data items in the one or more column-major in-memory compression-units; a second execution plan that does not perform a full table scan that accesses copies of data items in the one or more column-major in-memory compression-units; for each execution plan of the plurality of alternative execution plans, determining a cost associated with the execution plan; wherein determining the cost for the first execution plan includes at least one of; determining what fraction of the particular row-major table is duplicated in the one or more column-major in-memory compression units; determining how many column-major in-memory compression-units will need to be scanned, as part of the full table scan, after pruning non-matching column-major in-memory compression-units based on (a) filter predicates of the query, and (b) minimum and maximum values maintained for each column-major in-memory compression-unit; determining a decompression cost for decompressing those column-major in-memory compression-units to be used during performance of the full table scan; determining a row-stitching cost that represents overhead required to produce row-major results from data items extracted from column-major in-memory compression-units;
ordetermining a journal-scan cost for scanning a journal, maintained in volatile memory, that contains updates for data items that have become invalid in the one or more column-major in-memory compression-units; selecting a particular execution plan, of the plurality of alternative execution plans, based on the cost determined for the particular execution plan; obtaining a result set for the query by executing the particular execution plan; and returning the result set of the query. - View Dependent Claims (14, 15, 16, 17, 18, 19, 20, 21)
-
-
22. One or more non-transitory computer-readable media storing instructions which, when executed by one or more processors, cause:
-
receiving, at a database server, a query that requires performance of particular operations; wherein the particular operations include operations that access particular data items; wherein copies of the particular data items reside in a particular row-major table on persistent storage; wherein additional copies of at least a subset of the particular data items reside, in volatile memory, in one or more column-major in-memory compression-units; splitting the query into; a first query that accesses portions of the particular table that are not duplicated in the one or more column-major in-memory compression units, and a second query that accesses portions of the particular table that are duplicated in the one or more column-major in-memory compression units; evaluating a first plurality of alternative execution plans for the first query; wherein evaluating the first plurality of alternative execution plans includes determining costs for each of the first plurality of alternative execution plans; evaluating a second plurality of alternative execution plans for the second query; wherein evaluating the second plurality of alternative execution plans includes determining costs for each of the second plurality of alternative execution plans; selecting a first particular execution plan, of the first plurality of alternative execution plans, based on the cost determined for the first particular execution plan; selecting a second particular execution plan, of the second plurality of alternative execution plans, based on the cost determined for the second particular execution plan; obtaining a result set for the query by; executing the first particular execution plan to produce a first result set, executing the second particular execution plan to produce a second result set, and combining the first result set and the second result set into a combined result set for the query; and returning the combined result set for the query. - View Dependent Claims (23, 24)
-
Specification