Techniques for evaluating query predicates during in-memory table scans

US 10,025,823 B2
Filed: 07/22/2015
Issued: 07/17/2018
Est. Priority Date: 05/29/2015
Status: Active Grant

First Claim

Patent Images

1. A method comprising:

prior to receiving a query, with at least one predicate, that requires work to be performed on a first table, performing the steps of;

dividing data from the first table into a plurality of chunks;

populating, into a volatile memory of at least one host node, the plurality of chunks;

compressing each given chunk of the plurality of chunks into a given plurality of columnar units;

in response to receiving the query with the at least one predicate, wherein the query with at least one predicate is a join query with a join key that corresponds to a first column from the first table and a second column from a second table, performing;

generating, from the at least one predicate, a condition to evaluate against a particular columnar unit;

wherein the particular columnar unit is a columnar unit from the given plurality of columnar units that were generated from a given chunk of the plurality of chunks;

wherein the particular columnar unit stores compressed values from a portion of the first column of the first table;

during an in-memory scan of at least a portion of the first table, without decompressing the compressed values in the particular columnar unit, comparing data from the particular columnar unit with the condition;

based on the comparison, filtering data items from the particular columnar unit to produce a first set of intermediate results for the query;

comparing data items from a join key of a second set of intermediate results, created after applying a hash function used in a hash join to values from the second column of the second table, to the data items from the join key of the first set of intermediate results to generate a set of results for the join query;

wherein the method is performed by one or more nodes.

View all claims

1 Assignment

Timeline View

Assignment View

0 Petitions

Accused Products

Abstract

Techniques are described herein for filtering data from a table during an in-memory scan. Predicates are pushed to in-memory scan to avoid scanning unnecessary columnar units and reduce the overhead of decompressing, row stitching and distributing data during evaluation. Techniques are described herein for generating implied predicates that have conditions on single columns from complex predicates that have multiple conditions on the same column, which can be evaluated during an in-memory scan. Techniques are also described herein to reduce the overhead of a table scan for processing a join query. When redistributing a first table for performing a hash-join, the nodes performing an in-memory scan of the first table may create a filter that tracks unique values from the join key. Data from the second table is only processed and transferred to other nodes in the cluster if the values from the join key pass through the filter.

202 Citations

20 Claims

1. A method comprising:
- prior to receiving a query, with at least one predicate, that requires work to be performed on a first table, performing the steps of;
  
  dividing data from the first table into a plurality of chunks;
  
  populating, into a volatile memory of at least one host node, the plurality of chunks;
  
  compressing each given chunk of the plurality of chunks into a given plurality of columnar units;
  
  in response to receiving the query with the at least one predicate, wherein the query with at least one predicate is a join query with a join key that corresponds to a first column from the first table and a second column from a second table, performing;
  
  generating, from the at least one predicate, a condition to evaluate against a particular columnar unit;
  
  wherein the particular columnar unit is a columnar unit from the given plurality of columnar units that were generated from a given chunk of the plurality of chunks;
  
  wherein the particular columnar unit stores compressed values from a portion of the first column of the first table;
  
  during an in-memory scan of at least a portion of the first table, without decompressing the compressed values in the particular columnar unit, comparing data from the particular columnar unit with the condition;
  
  based on the comparison, filtering data items from the particular columnar unit to produce a first set of intermediate results for the query;
  
  comparing data items from a join key of a second set of intermediate results, created after applying a hash function used in a hash join to values from the second column of the second table, to the data items from the join key of the first set of intermediate results to generate a set of results for the join query;
  
  wherein the method is performed by one or more nodes.
- View Dependent Claims (2, 3, 4, 5, 6, 7, 8, 9, 10)
- - 2. The method of claim 1, wherein the method further comprises:
    - stitching data items from other columnar units, that were generated from the given chunk and that are associated with the particular columnar unit, to form the first set of intermediate results for the query.
  - 3. The method of claim 1, wherein the method further comprises:
    - evaluating the at least one predicate against the first set of intermediate results for the query to compute a set of results for the query.
  - 4. The method of claim 1,wherein the at least one predicate includes a plurality of conditions;
    - wherein the plurality of conditions includes at least two conditions that compare values to data items from the first column;
      
      wherein generating from the at least one predicate the condition to evaluate against the particular columnar unit includes the steps of;
      
      (a) combining the at least two conditions that compare values to data items from the first column into a single necessary condition; and
      
      (b) removing any particular condition that is not necessary for the at least one predicate to evaluate to true.
  - 5. The method of claim 4, wherein the method further comprises repeating the steps of combining and removing to simplify more than two conditions that compare values to data items from the first column.
  - 6. The method of claim 1,wherein prior to receiving the query, the method further comprises:
    - identifying a range of values stored in the particular columnar unit;
      
      wherein the method further comprises, in response to receiving the query with the at least one predicate, performing;
      
      comparing the range of values with the condition, andskipping predicate evaluation on the given plurality of columnar units that were generated from the given chunk corresponding to the particular columnar unit that does not have a value in the range specified by the condition.
  - 7. The method of claim 1, further comprising, in response to receiving the query with the at least one predicate, performing:
    - compressing some of the data from the condition according to the same compression technique used to compress the particular columnar unit;
      
      wherein during the in-memory scan, a set of compressed data items from the particular columnar unit are compared with the data compressed from the condition.
  - 8. The method of claim 1,wherein the method further comprises, in response to receiving the query with the at least one predicate, performing:
    - applying the hash function used in the hash join to create a filter of distinct values based on the join key from the first set of intermediate results;
      
      pushing the filter to an in-memory scan of the second table in the join query;
      
      prior to comparing the data items from the second table to the first set of intermediate results, applying the filter to the in-memory scan of the second table to generate the second set of intermediate results.
  - 9. The method of claim 8, wherein:
    - the at least one host node includes multiple host nodes;
      
      the filter is applied to the second table prior to distributing the second set of intermediate results via inter-process communication;
      
      the method further comprises;
      
      distributing the first set of intermediate results among the host nodes; and
      
      distributing the second set of intermediate results, via inter-process communication, based on how the first set of intermediate results are distributed among the host nodes.
  - 10. The method of claim 8, wherein the method further comprises:
    - prior to creating the filter, determining from the join query with at least one predicate, which table of the first table and the second table comprises a dimension table based on predicate selectivity;
      
      based on the determining, selecting the dimension table for creating the filter.

11. One or more non-transitory storage media storing instructions which, when executed by one or more computing devices, cause performance of a method comprising the steps of:
- prior to receiving a query, with at least one predicate, that requires work to be performed on a first table, performing the steps of;
  
  dividing data from the first table into a plurality of chunks;
  
  populating, into a volatile memory of at least one host node, the plurality of chunks;
  
  compressing each given chunk of the plurality of chunks into a given plurality of columnar units;
  
  in response to receiving the query with the at least one predicate, wherein the query with at least one predicate is a join query with a join key that corresponds to a first column from the first table and a second column from a second table, performing;
  
  generating from the at least one predicate a condition to evaluate against a particular columnar unit;
  
  wherein the particular columnar unit is a columnar unit from the given plurality of columnar units that were generated from a given chunk of the plurality of chunks;
  
  wherein the particular columnar unit stores compressed values from a portion of the first column of the first table;
  
  during an in-memory scan of at least a portion of the first table, without decompressing the compressed values in the particular columnar unit, comparing data from the particular columnar unit with the condition;
  
  based on the comparison, filtering data items from the particular columnar unit to produce a first set of intermediate results for the query;
  
  comparing data items from a join key of a second set of intermediate results, created after applying a hash function used in a hash join to values from the second column of the second table, to the data items from the join key of the first set of intermediate results to generate a set of results for the join query.
- View Dependent Claims (12, 13, 14, 15, 16, 17)
- - 12. The one or more non-transitory storage media storing instructions of claim 11, when executed by one or more computing devices, further causing the performance of:
    - stitching data items from other columnar units, that were generated from the given chunk and that are associated with the particular columnar unit, to form the first set of intermediate results for the query.
  - 13. The one or more non-transitory storage media storing instructions of claim 11, when executed by the one or more computing devices, further causing the performance of:
    - evaluating the at least one predicate against the first set of intermediate results for the query to compute a set of results for the query.
  - 14. The one or more non-transitory storage media storing instructions of claim 11,wherein the at least one predicate includes a plurality of conditions;
    - wherein the plurality of conditions includes at least two conditions that compare values to data items from the first column;
      
      wherein generating from the at least one predicate the condition to evaluate against the particular columnar unit includes the steps of;
      
      (a) combining the at least two conditions that compare values to data items from the first column into a single necessary condition; and
      
      (b) removing any particular condition that is not necessary for the at least one predicate to evaluate to true.
  - 15. The one or more non-transitory storage media storing instructions of claim 14, wherein the instructions, when executed by the one or more computing devices, further cause the steps of repeating the steps of combining and removing to simplify more than two conditions that compare values to data items from the first column.
  - 16. The one or more non-transitory storage media storing instructions of claim 11,wherein prior to receiving the query, the instructions, when executed by the one or more computing devices, further cause the step of:
    - identifying a range of values stored in the particular columnar unit;
      
      wherein in response to receiving the query with the at least one predicate, the instructions, when executed by the one or more computing devices, further cause the steps of;
      
      comparing the range of values with the condition, andskipping predicate evaluation on the given plurality of columnar units that were generated from the given chunk corresponding to the particular columnar unit that does not have a value in the range specified by the condition.
  - 17. The one or more non-transitory storage media storing instructions of claim 11, wherein in response to receiving the query with the at least one predicate, the instructions, when executed by the one or more computing devices, further causes steps of:
    - compressing some of the data from the condition according to the same compression technique used to compress the particular columnar unit;
      
      wherein during the in-memory scan, a set of compressed data items from the particular columnar unit are compared with the data compressed from the condition.

18. A system comprising one or more computing devices configured to perform a process, comprising:
- prior to receiving a query, with at least one predicate, that requires work to be performed on a first table, performing the steps of;
  
  dividing data from the first table into a plurality of chunks;
  
  populating, into a volatile memory of at least one host node, the plurality of chunks;
  
  compressing each given chunk of the plurality of chunks into a given plurality of columnar units;
  
  in response to receiving the query with the at least one predicate, wherein the query with at least one predicate is a join query with a join key that corresponds to a first column from the first table and a second column from a second table, performing;
  
  generating from the at least one predicate a condition to evaluate against a particular columnar unit;
  
  wherein the particular columnar unit is a columnar unit from the given plurality of columnar units that were generated from a given chunk of the plurality of chunks;
  
  wherein the particular columnar unit stores compressed values from a portion of the first column of the first table;
  
  during an in-memory scan of at least a portion of the first table, without decompressing the compressed values in the particular columnar unit, comparing data from the particular columnar unit with the condition;
  
  based on the comparison, filtering data items from the particular columnar unit to produce a first set of intermediate results for the query;
  
  comparing data items from a join key of a second set of intermediate results, created after applying a hash function used in a hash join to values from the second column of the second table, to the data items from the join key of the first set of intermediate results to generate a set of results for the join query.
- View Dependent Claims (19, 20)
- - 19. The system of claim 18, wherein the process further comprises:
    - evaluating the at least one predicate against the first set of intermediate results for the query to compute a set of results for the query.
  - 20. The system of claim 18,wherein the at least one predicate includes a plurality of conditions;
    - wherein the plurality of conditions includes at least two conditions that compare values to data items from the first column;
      
      wherein generating from the at least one predicate the condition to evaluate against the particular columnar unit includes the steps of;
      
      (a) combining the at least two conditions that compare values to data items from the first column into a single necessary condition; and
      
      (b) removing any particular condition that is not necessary for the at least one predicate to evaluate to true.

Specification

Resources

Litigation Campaign Assessment

Current Assignee
Oracle International Corporation (Oracle Corporation)
Original Assignee
Oracle International Corporation (Oracle Corporation)
Inventors
Das, Dinesh, Yan, Jiaqi, Zait, Mohamed, Vyas, Nirav
Primary Examiner(s)
Ly, Anh

Application Number

US14/806,614
Publication Number

US 20160350347A1
Time in Patent Office

1,091 Days
Field of Search
US Class Current
CPC Class Codes

G06F 11/34   Recording or statistical ev...

G06F 12/023   Free address space management

G06F 16/2282   Tablespace storage structur...

G06F 16/2365   Ensuring data consistency a...

G06F 16/24542   Plan optimisation

G06F 16/24544   Join order optimisation

G06F 16/2455   Query execution

G06F 16/2456   Join operations

G06F 16/258   Data format conversion from...

G06F 2212/1044   Space efficiency improvement

G06F 2212/401   Compressed data

Techniques for evaluating query predicates during in-memory table scans

First Claim

1 Assignment

0 Petitions

Accused Products

Abstract

202 Citations

20 Claims

Specification

Solutions

Use Cases

Quick Links

Techniques for evaluating query predicates during in-memory table scans

First Claim

1 Assignment

Subscription Required

Subscription Required

0 Petitions

Subscription Required

Accused Products

Subscription Required

Abstract

202 Citations

20 Claims

Specification

Subscription Required

Solutions

Use Cases

Quick Links