DISK-BASED HASH JOIN PROCESS
First Claim
1. A computer-implemented method for processing a database query, the method comprising:
- receiving a request to perform a database query specifying a join of an inner table and an outer table, wherein the inner table is smaller than the outer table;
receiving a limit on memory used for storing a hash table;
building the hash table using data from rows of the inner table, the hash table comprising one or more partitions, each partition comprising one or more hash buckets, each hash bucket storing data from rows that map to a hash code value based on a hashing function;
receiving a request to add data of a new row of the inner table to the hash table;
determining whether addition of data of the new row will cause the hash table to exceed the memory limit;
responsive to determining that addition of data of the new row will cause the hash table to exceed the memory limit, selecting a partition of the hash table for spilling to a persistent storage area, the selecting based on whether the size of the selected partition exceeds sizes of at least a plurality of other partitions of the hash table;
spilling the selected partition to the persistent storage area, the spilling comprising, storing data from the selected partition in the persistent storage area; and
reusing memory space obtained from spilling the selected partition to persistent storage for storing data of the new row.
6 Assignments
0 Petitions
Accused Products
Abstract
A database system performs hash join process for processing queries that join an inner and an outer database table. The hash join processes builds a hash table in memory for the inner table. The database system receives a limit on the memory for storing the hash table. The database system maximizes the number of partitions stored in memory for the hash table. If the hash table exceeds the limit of the memory while adding rows from the inner table, the database system selects a partition for spilling to a persistent storage. The partition selected for spilling to may be the largest partition or a partition larger than most of the partitions. The database system initializes the hash table to a number of partitions that is substantially equal to half of the total number of blocks that can be stored within the specified limit of memory for the hash table.
-
Citations
20 Claims
-
1. A computer-implemented method for processing a database query, the method comprising:
-
receiving a request to perform a database query specifying a join of an inner table and an outer table, wherein the inner table is smaller than the outer table; receiving a limit on memory used for storing a hash table; building the hash table using data from rows of the inner table, the hash table comprising one or more partitions, each partition comprising one or more hash buckets, each hash bucket storing data from rows that map to a hash code value based on a hashing function; receiving a request to add data of a new row of the inner table to the hash table; determining whether addition of data of the new row will cause the hash table to exceed the memory limit; responsive to determining that addition of data of the new row will cause the hash table to exceed the memory limit, selecting a partition of the hash table for spilling to a persistent storage area, the selecting based on whether the size of the selected partition exceeds sizes of at least a plurality of other partitions of the hash table; spilling the selected partition to the persistent storage area, the spilling comprising, storing data from the selected partition in the persistent storage area; and reusing memory space obtained from spilling the selected partition to persistent storage for storing data of the new row. - View Dependent Claims (2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18)
-
-
19. A computer implemented system for processing a database query, the system comprising:
-
a computer processor; and a computer-readable storage medium storing computer program modules configured to execute on the computer processor, the computer program modules comprising; a database system configured to; receive a request to perform a database query specifying a join of an inner table and an outer table, wherein the inner table is smaller than the outer table; receive a limit on memory used for storing a hash table; build the hash table using data from rows of the inner table, the hash table comprising one or more partitions, each partition comprising one or more hash buckets, each hash bucket storing data from rows that map to a hash code value based on a hashing function; receive a request to add data of a new row of the inner table to the hash table; determine whether addition of data of the new row will cause the hash table to exceed the memory limit; responsive to determining that addition of data of the new row will cause the hash table to exceed the memory limit, select a partition of the hash table for spilling to a persistent storage area, the selecting based on whether the size of the selected partition exceeds sizes of at least a plurality of other partitions of the hash table; spill the selected partition to the persistent storage area by storing data from the selected partition in the persistent storage area; and reuse memory space obtained from spilling the selected partition to persistent storage for storing data of the new row.
-
-
20. A computer program product having a non-transitory computer-readable storage medium storing computer-executable code for processing a database query, the code comprising:
a database system configured to; receive a request to perform a database query specifying a join of an inner table and an outer table, wherein the inner table is smaller than the outer table; receive a limit on memory used for storing a hash table; build the hash table using data from rows of the inner table, the hash table comprising one or more partitions, each partition comprising one or more hash buckets, each hash bucket storing data from rows that map to a hash code value based on a hashing function; receive a request to add data of a new row of the inner table to the hash table; determine whether addition of data of the new row will cause the hash table to exceed the memory limit; responsive to determining that addition of data of the new row will cause the hash table to exceed the memory limit, select a partition of the hash table for spilling to a persistent storage area, the selecting based on whether the size of the selected partition exceeds sizes of at least a plurality of other partitions of the hash table; spill the selected partition to the persistent storage area by storing data from the selected partition in the persistent storage area; and reuse memory space obtained from spilling the selected partition to persistent storage for storing data of the new row.
Specification