DISK-BASED HASH JOIN PROCESS

US 20140250142A1
Filed: 03/01/2013
Published: 09/04/2014
Est. Priority Date: 03/01/2013
Status: Active Grant

First Claim

Patent Images

1. A computer-implemented method for processing a database query, the method comprising:

receiving a request to perform a database query specifying a join of an inner table and an outer table, wherein the inner table is smaller than the outer table;

receiving a limit on memory used for storing a hash table;

building the hash table using data from rows of the inner table, the hash table comprising one or more partitions, each partition comprising one or more hash buckets, each hash bucket storing data from rows that map to a hash code value based on a hashing function;

receiving a request to add data of a new row of the inner table to the hash table;

determining whether addition of data of the new row will cause the hash table to exceed the memory limit;

responsive to determining that addition of data of the new row will cause the hash table to exceed the memory limit, selecting a partition of the hash table for spilling to a persistent storage area, the selecting based on whether the size of the selected partition exceeds sizes of at least a plurality of other partitions of the hash table;

spilling the selected partition to the persistent storage area, the spilling comprising, storing data from the selected partition in the persistent storage area; and

reusing memory space obtained from spilling the selected partition to persistent storage for storing data of the new row.

View all claims

6 Assignments

Timeline View

Assignment View

0 Petitions

Accused Products

Abstract

A database system performs hash join process for processing queries that join an inner and an outer database table. The hash join processes builds a hash table in memory for the inner table. The database system receives a limit on the memory for storing the hash table. The database system maximizes the number of partitions stored in memory for the hash table. If the hash table exceeds the limit of the memory while adding rows from the inner table, the database system selects a partition for spilling to a persistent storage. The partition selected for spilling to may be the largest partition or a partition larger than most of the partitions. The database system initializes the hash table to a number of partitions that is substantially equal to half of the total number of blocks that can be stored within the specified limit of memory for the hash table.

Citations

20 Claims

1. A computer-implemented method for processing a database query, the method comprising:
- receiving a request to perform a database query specifying a join of an inner table and an outer table, wherein the inner table is smaller than the outer table;
  
  receiving a limit on memory used for storing a hash table;
  
  building the hash table using data from rows of the inner table, the hash table comprising one or more partitions, each partition comprising one or more hash buckets, each hash bucket storing data from rows that map to a hash code value based on a hashing function;
  
  receiving a request to add data of a new row of the inner table to the hash table;
  
  determining whether addition of data of the new row will cause the hash table to exceed the memory limit;
  
  responsive to determining that addition of data of the new row will cause the hash table to exceed the memory limit, selecting a partition of the hash table for spilling to a persistent storage area, the selecting based on whether the size of the selected partition exceeds sizes of at least a plurality of other partitions of the hash table;
  
  spilling the selected partition to the persistent storage area, the spilling comprising, storing data from the selected partition in the persistent storage area; and
  
  reusing memory space obtained from spilling the selected partition to persistent storage for storing data of the new row.
- View Dependent Claims (2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18)
- - 2. The computer-implemented method of claim 1, wherein selecting of the partition comprises selecting the largest partition for reusing the memory space occupied by the partition.
  - 3. The computer-implemented method of claim 1, wherein the selecting of the partition comprises:
    - identifying a set of partitions of the hash table comprising partitions that are larger than the remaining partitions of the hash table; and
      
      selecting one of the partitions from the set of the partitions for spilling to the persistent storage.
  - 4. The computer-implemented method of claim 1, further comprising:
    - responsive to receiving the request to add data of the new row to the hash table, determining whether the data of the new row can be added to an existing block of the hash table; and
      
      responsive to determining that there is none of the existing blocks of the hash table have capacity to store data from the new row, determining whether a new block can be added to the hash table.
  - 5. The computer-implemented method of claim 1, wherein building the hash table comprises determining an initial number of partitions of the hash table as substantially equal to half of the number of blocks corresponding to the memory limit.
  - 6. The computer-implemented method of claim 1, wherein building the hash table comprises determining an initial number of partitions of the hash table as a value within a small threshold value of half of the number of blocks corresponding to the memory limit.
  - 7. The computer-implemented method of claim 1, wherein building the hash table comprises allocating two hash buckets per partition of the hash table.
  - 8. The computer-implemented method of claim 1, wherein building the hash table comprises allocating two hash buckets per partition for a substantial number of partitions of the hash table.
  - 9. The computer-implemented method of claim 1, wherein the size of a partition is determined as the number of blocks stored in the hash buckets of the partition.
  - 10. The computer-implemented method of claim 1, further comprising:
    - responsive to receiving the request to add data of the new row to the hash table, identifying a hash code value by applying a hash function to data of the new row;
      
      selecting a hash bucket of the hash table corresponding to the hash code; and
      
      selecting a block of data from the selected hash bucket for adding the data of the new row.
  - 11. The computer-implemented method of claim 1, further comprising, for each partition, storing a measure of the amount of data stored in each partition.
  - 12. The computer-implemented method of claim 1, further comprising:
    - responsive to completion of the processing of rows of inner table for building the hash table, determining whether at least one partition of the hash table was selected for reusing the memory space of the partition; and
      
      responsive to determining that at least one partition of the hash table was selected for reusing the memory space, rebuilding the hash table by adding partitions to the hash table based on sizes of the partitions.
  - 13. The computer-implemented method of claim 12, wherein rebuilding the hash table comprises:
    - selecting a plurality of partitions for adding to the hash table, wherein the plurality of partitions are smaller than remaining partitions.
  - 14. The computer-implemented method of claim 1, wherein building a hash table comprises:
    - receiving an estimate of size of data of the inner table;
      
      determining whether the inner table can be entirely stored in the hash table based on the estimate of the size of data; and
      
      responsive to determining that the inner table can be accommodated in the hash table having a single partition for the hash table.
  - 15. The computer-implemented method of claim 1, further comprising:
    - receiving rows from the outer table; and
      
      for each row of the outer table, finding a matching row of the inner table using the hash table.
  - 16. The computer-implemented method of claim 15, further comprising:
    - determining that a particular row of the outer table matches a row of the inner table from a partition selected for reusing the memory space occupied by the partition;
      
      storing information describing the particular row on persistent storage in a left over partition; and
      
      for each row of the outer table, finding a matching row of the inner table using the hash table.
  - 17. The computer-implemented method of claim 16, further comprising:
    - performing a join operation for each row of the outer table stored in the left over partition and rows of each partition of inner table that was selected for reusing the memory space.
  - 18. The computer-implemented method of claim 15, further comprising:
    - responsive to determining that the number of rows remaining to be processed for the outer table is less than the number of rows remaining to be processed for the inner table, performing a hash join of the remaining rows of the inner table and the outer table by building a hash table using the remaining rows of the outer table.

19. A computer implemented system for processing a database query, the system comprising:
- a computer processor; and
  
  a computer-readable storage medium storing computer program modules configured to execute on the computer processor, the computer program modules comprising;
  
  a database system configured to;
  
  receive a request to perform a database query specifying a join of an inner table and an outer table, wherein the inner table is smaller than the outer table;
  
  receive a limit on memory used for storing a hash table;
  
  build the hash table using data from rows of the inner table, the hash table comprising one or more partitions, each partition comprising one or more hash buckets, each hash bucket storing data from rows that map to a hash code value based on a hashing function;
  
  receive a request to add data of a new row of the inner table to the hash table;
  
  determine whether addition of data of the new row will cause the hash table to exceed the memory limit;
  
  responsive to determining that addition of data of the new row will cause the hash table to exceed the memory limit, select a partition of the hash table for spilling to a persistent storage area, the selecting based on whether the size of the selected partition exceeds sizes of at least a plurality of other partitions of the hash table;
  
  spill the selected partition to the persistent storage area by storing data from the selected partition in the persistent storage area; and
  
  reuse memory space obtained from spilling the selected partition to persistent storage for storing data of the new row.

20. A computer program product having a non-transitory computer-readable storage medium storing computer-executable code for processing a database query, the code comprising:
- a database system configured to;
  
  receive a request to perform a database query specifying a join of an inner table and an outer table, wherein the inner table is smaller than the outer table;
  
  receive a limit on memory used for storing a hash table;
  
  build the hash table using data from rows of the inner table, the hash table comprising one or more partitions, each partition comprising one or more hash buckets, each hash bucket storing data from rows that map to a hash code value based on a hashing function;
  
  receive a request to add data of a new row of the inner table to the hash table;
  
  determine whether addition of data of the new row will cause the hash table to exceed the memory limit;
  
  responsive to determining that addition of data of the new row will cause the hash table to exceed the memory limit, select a partition of the hash table for spilling to a persistent storage area, the selecting based on whether the size of the selected partition exceeds sizes of at least a plurality of other partitions of the hash table;
  
  spill the selected partition to the persistent storage area by storing data from the selected partition in the persistent storage area; and
  
  reuse memory space obtained from spilling the selected partition to persistent storage for storing data of the new row.

Specification

Resources

Litigation Campaign Assessment

Current Assignee
Paraccel Incorporated (HCL Technologies Limited)
Original Assignee
Paraccel Incorporated (HCL Technologies Limited)
Inventors
Pradhan, Mayank, Galimberti, David, Chu, Brian Pak-Ning, Wilhite, David Jr., Birnbaum, Adam, Dyskant, Raymi

Granted Patent

US 9,275,110 B2
Time in Patent Office

Days
Field of Search
US Class Current

707/765
CPC Class Codes

G06F 12/1018   involving hashing technique...

G06F 16/2255   Hash tables

G06F 16/2453   Query optimisation

G06F 16/2456   Join operations

G06F 16/284   Relational databases

DISK-BASED HASH JOIN PROCESS

First Claim

6 Assignments

0 Petitions

Accused Products

Abstract

Citations

20 Claims

Specification

Solutions

Use Cases

Quick Links

DISK-BASED HASH JOIN PROCESS

First Claim

6 Assignments

Subscription Required

Subscription Required

0 Petitions

Subscription Required

Accused Products

Subscription Required

Abstract

Citations

20 Claims

Specification

Subscription Required

Solutions

Use Cases

Quick Links