Parallel build of non-partitioned join hash tables and non-enforced N:1 join hash tables
First Claim
1. A method for building a hash table over a subset of data in a data set comprising:
- mapping keys in the data set to values in the data set using multiple parallel computation threads;
each thread scanning a subset of the keys and values and partitioning the subset of the keys and values into a plurality of partitions;
determining a cumulative count for a number of keys and values in each partition;
forming a hash table with space reserved for each partition based on the determined cumulative counts;
each thread selecting one or more partitions and inserting keys and values belonging to the selected one or more partitions into the hash table in the reserved space for those partitions; and
creating a compact hash table comprising a bitmap and a compacted array.
1 Assignment
0 Petitions
Accused Products
Abstract
A method for building a hash table over a subset of data in a data set includes mapping keys in the data set to values in the data set using multiple parallel computation threads. Each thread scans a subset of the keys and values and partitioning the subset of the keys and values into multiple partitions. A cumulative count for keys and values in each partition is determined. A hash table with space reserved for each partition is formed based on the determined cumulative counts. Each thread selects one or more partitions and inserts keys and values belonging to the selected one or more partitions into the hash table in the reserved space for those partitions.
-
Citations
15 Claims
-
1. A method for building a hash table over a subset of data in a data set comprising:
-
mapping keys in the data set to values in the data set using multiple parallel computation threads; each thread scanning a subset of the keys and values and partitioning the subset of the keys and values into a plurality of partitions; determining a cumulative count for a number of keys and values in each partition; forming a hash table with space reserved for each partition based on the determined cumulative counts; each thread selecting one or more partitions and inserting keys and values belonging to the selected one or more partitions into the hash table in the reserved space for those partitions; and creating a compact hash table comprising a bitmap and a compacted array. - View Dependent Claims (2, 3, 4, 5, 6, 7, 8, 9)
-
-
10. A computer program product for building a hash table over a subset of data in a data set, the computer program product comprising a computer readable storage medium having program code embodied therewith, the program code executable by a processor to:
-
map, by the processor, hash keys in the data set to values in the data set using multiple parallel computation threads; scan, by each thread, a subset of the keys and values and partitions the subset of the keys and values into a plurality of partitions; determine, by the processor, a cumulative count for a number of keys and values in each partition; form, by the processor, a hash table with space reserved for each partition based on the determined cumulative counts; select, by each thread, one or more partitions and inserts its keys and values into the hash table in the reserved space for those partitions; and create, by the processor, a compact hash table comprising a bit a and a compacted array. - View Dependent Claims (11, 12, 13, 14, 15)
-
Specification