PARALLEL BUILD OF NON-PARTITIONED JOIN HASH TABLES AND NON-ENFORCED N:1 JOIN HASH TABLES
First Claim
1. A method for building a hash table over a subset of data in a data set comprising:
- mapping keys in the data set to values in the data set using multiple parallel computation threads;
each thread scanning a subset of the keys and values and partitioning the subset of the keys and values into a plurality of partitions;
determining a cumulative count for a number of keys and values in each partition;
forming a hash table with space reserved for each partition based on the determined cumulative counts; and
each thread selecting one or more partitions and inserting keys and values belonging to the selected one or more partitions into the hash table in the reserved space for those partitions.
1 Assignment
0 Petitions
Accused Products
Abstract
A method for building a hash table over a subset of data in a data set includes mapping keys in the data set to values in the data set using multiple parallel computation threads. Each thread scans a subset of the keys and values and partitioning the subset of the keys and values into multiple partitions. A cumulative count for keys and values in each partition is determined. A hash table with space reserved for each partition is formed based on the determined cumulative counts. Each thread selects one or more partitions and inserts keys and values belonging to the selected one or more partitions into the hash table in the reserved space for those partitions.
-
Citations
20 Claims
-
1. A method for building a hash table over a subset of data in a data set comprising:
-
mapping keys in the data set to values in the data set using multiple parallel computation threads; each thread scanning a subset of the keys and values and partitioning the subset of the keys and values into a plurality of partitions; determining a cumulative count for a number of keys and values in each partition; forming a hash table with space reserved for each partition based on the determined cumulative counts; and each thread selecting one or more partitions and inserting keys and values belonging to the selected one or more partitions into the hash table in the reserved space for those partitions. - View Dependent Claims (2, 3, 4, 5, 6, 7, 8, 9)
-
-
10. A computer program product for building a hash table over a subset of data in a data set, the computer program product comprising a computer readable storage medium having program code embodied therewith, the program code readable/executable by a processor to:
-
map hash keys in the data set to values in the data set using multiple parallel computation threads; each thread scans a subset of the keys and values and partitions the subset of the keys and values into a plurality of partitions; determine a cumulative count for a number of keys and values in each partition; form a hash table with space reserved for each partition based on the determined cumulative counts; and each thread selects one or more partitions and inserts its keys and values into the hash table in the reserved space for those partitions. - View Dependent Claims (11, 12, 13, 14, 15)
-
-
16. A method for creating a compact hash table comprising:
-
a thread executing using a processor for scanning a subset of keys and values, and for each key, performing a hash operation and inserting hashed keys into a bitmap structure; determining cumulative population counts of keys and values within the bitmap; repeating scanning of the subset of the keys and values; and inserting the keys and values into a compacted array using the cumulative population counts. - View Dependent Claims (17, 18, 19, 20)
-
Specification