Techniques for evaluating query predicates during in-memory table scans
First Claim
1. A method comprising:
- prior to receiving a query, with at least one predicate, that requires work to be performed on a first table, performing the steps of;
dividing data from the first table into a plurality of chunks;
populating, into a volatile memory of at least one host node, the plurality of chunks;
compressing each given chunk of the plurality of chunks into a given plurality of columnar units;
in response to receiving the query with the at least one predicate, wherein the query with at least one predicate is a join query with a join key that corresponds to a first column from the first table and a second column from a second table, performing;
generating, from the at least one predicate, a condition to evaluate against a particular columnar unit;
wherein the particular columnar unit is a columnar unit from the given plurality of columnar units that were generated from a given chunk of the plurality of chunks;
wherein the particular columnar unit stores compressed values from a portion of the first column of the first table;
during an in-memory scan of at least a portion of the first table, without decompressing the compressed values in the particular columnar unit, comparing data from the particular columnar unit with the condition;
based on the comparison, filtering data items from the particular columnar unit to produce a first set of intermediate results for the query;
comparing data items from a join key of a second set of intermediate results, created after applying a hash function used in a hash join to values from the second column of the second table, to the data items from the join key of the first set of intermediate results to generate a set of results for the join query;
wherein the method is performed by one or more nodes.
1 Assignment
0 Petitions
Accused Products
Abstract
Techniques are described herein for filtering data from a table during an in-memory scan. Predicates are pushed to in-memory scan to avoid scanning unnecessary columnar units and reduce the overhead of decompressing, row stitching and distributing data during evaluation. Techniques are described herein for generating implied predicates that have conditions on single columns from complex predicates that have multiple conditions on the same column, which can be evaluated during an in-memory scan. Techniques are also described herein to reduce the overhead of a table scan for processing a join query. When redistributing a first table for performing a hash-join, the nodes performing an in-memory scan of the first table may create a filter that tracks unique values from the join key. Data from the second table is only processed and transferred to other nodes in the cluster if the values from the join key pass through the filter.
202 Citations
20 Claims
-
1. A method comprising:
-
prior to receiving a query, with at least one predicate, that requires work to be performed on a first table, performing the steps of; dividing data from the first table into a plurality of chunks; populating, into a volatile memory of at least one host node, the plurality of chunks; compressing each given chunk of the plurality of chunks into a given plurality of columnar units; in response to receiving the query with the at least one predicate, wherein the query with at least one predicate is a join query with a join key that corresponds to a first column from the first table and a second column from a second table, performing; generating, from the at least one predicate, a condition to evaluate against a particular columnar unit; wherein the particular columnar unit is a columnar unit from the given plurality of columnar units that were generated from a given chunk of the plurality of chunks; wherein the particular columnar unit stores compressed values from a portion of the first column of the first table; during an in-memory scan of at least a portion of the first table, without decompressing the compressed values in the particular columnar unit, comparing data from the particular columnar unit with the condition; based on the comparison, filtering data items from the particular columnar unit to produce a first set of intermediate results for the query; comparing data items from a join key of a second set of intermediate results, created after applying a hash function used in a hash join to values from the second column of the second table, to the data items from the join key of the first set of intermediate results to generate a set of results for the join query; wherein the method is performed by one or more nodes. - View Dependent Claims (2, 3, 4, 5, 6, 7, 8, 9, 10)
-
-
11. One or more non-transitory storage media storing instructions which, when executed by one or more computing devices, cause performance of a method comprising the steps of:
-
prior to receiving a query, with at least one predicate, that requires work to be performed on a first table, performing the steps of; dividing data from the first table into a plurality of chunks; populating, into a volatile memory of at least one host node, the plurality of chunks; compressing each given chunk of the plurality of chunks into a given plurality of columnar units; in response to receiving the query with the at least one predicate, wherein the query with at least one predicate is a join query with a join key that corresponds to a first column from the first table and a second column from a second table, performing; generating from the at least one predicate a condition to evaluate against a particular columnar unit; wherein the particular columnar unit is a columnar unit from the given plurality of columnar units that were generated from a given chunk of the plurality of chunks; wherein the particular columnar unit stores compressed values from a portion of the first column of the first table; during an in-memory scan of at least a portion of the first table, without decompressing the compressed values in the particular columnar unit, comparing data from the particular columnar unit with the condition; based on the comparison, filtering data items from the particular columnar unit to produce a first set of intermediate results for the query; comparing data items from a join key of a second set of intermediate results, created after applying a hash function used in a hash join to values from the second column of the second table, to the data items from the join key of the first set of intermediate results to generate a set of results for the join query. - View Dependent Claims (12, 13, 14, 15, 16, 17)
-
-
18. A system comprising one or more computing devices configured to perform a process, comprising:
-
prior to receiving a query, with at least one predicate, that requires work to be performed on a first table, performing the steps of; dividing data from the first table into a plurality of chunks; populating, into a volatile memory of at least one host node, the plurality of chunks; compressing each given chunk of the plurality of chunks into a given plurality of columnar units; in response to receiving the query with the at least one predicate, wherein the query with at least one predicate is a join query with a join key that corresponds to a first column from the first table and a second column from a second table, performing; generating from the at least one predicate a condition to evaluate against a particular columnar unit; wherein the particular columnar unit is a columnar unit from the given plurality of columnar units that were generated from a given chunk of the plurality of chunks; wherein the particular columnar unit stores compressed values from a portion of the first column of the first table; during an in-memory scan of at least a portion of the first table, without decompressing the compressed values in the particular columnar unit, comparing data from the particular columnar unit with the condition; based on the comparison, filtering data items from the particular columnar unit to produce a first set of intermediate results for the query; comparing data items from a join key of a second set of intermediate results, created after applying a hash function used in a hash join to values from the second column of the second table, to the data items from the join key of the first set of intermediate results to generate a set of results for the join query. - View Dependent Claims (19, 20)
-
Specification