Early hash join
First Claim
1. A method for joining relations comprising:
- receiving a first relation and a second relation;
generating a dual hash table, having a first side and a second side;
reading a first tuple from the second relation;
probing the first side of the dual hash table;
outputting a result of the probe; and
inserting the first tuple into the second side of the dual hash table.
1 Assignment
0 Petitions
Accused Products
Abstract
Minimizing both the response time to produce the first few thousand results and the overall execution time is important for interactive querying. Current join algorithms either minimize the execution time at the expense of response time or minimize response time by producing results early without optimizing the total time. Disclosed herein is a hash-based join algorithm, called early hash join, which can be dynamically configured at any point during join processing to tradeoff faster production of results for overall execution time. Varying how inputs are read has a major effect on these two factors and provide formulas that allow an optimizer to calculate the expected rate of join output and the number of I/O operations performed using different input reading strategies.
37 Citations
20 Claims
-
1. A method for joining relations comprising:
-
receiving a first relation and a second relation;
generating a dual hash table, having a first side and a second side;
reading a first tuple from the second relation;
probing the first side of the dual hash table;
outputting a result of the probe; and
inserting the first tuple into the second side of the dual hash table. - View Dependent Claims (2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16)
-
-
17. A method for joining relations comprising:
-
receiving a first relation and a second relation;
generating a dual hash table, having a first side and a second side;
determining if a tuple exists in either the first relation or the second relation;
initiating a clean up phase if there are no tuples remaining to be read from the relations, wherein the clean up phase produces any remaining results that were missed when input was being read;
reading a first tuple from one of the relations according to one of a plurality of reading policies wherein the reading policy determines which of the first or second relations is read;
probing the dual hash table;
determining if memory is full; and
initiating a memory overflow process if memory is full, wherein the memory overflow process comprises a biased flushing policy that favors the smaller relation. - View Dependent Claims (18)
-
-
19. A system for joining relations comprising:
-
a memory for storing relations wherein the relations comprise tuples;
a processor coupled to the memory, wherein the processor is configured to perform the steps of;
receiving a first relation and a second relation;
generating a dual hash table, having a first side and a second side;
determining if a tuple exists in either the first relation or the second relation;
initiating a clean up phase if there are no tuples remaining to be read from the relations, wherein the clean up phase produces any remaining results that were missed when input was being read;
reading a first tuple from one of the relations according to one of a plurality of reading policies wherein the reading policy determines which of the first or second relations is read;
probing the dual hash table;
determining if memory is full; and
initiating a memory overflow process if memory is full, wherein the memory overflow process comprises a biased flushing policy that favors the smaller relation. - View Dependent Claims (20)
-
Specification