System and method for hybrid hash join using over-partitioning to respond to database query
First Claim
1. A computer system, comprising:
- at least one computer having at least one main memory defining a memory size, the computer accessing at least one data storage disk;
one or more input devices associated with the computer for generating a user query for data stored in at least one probe table and at least one build table defining a build table size, the tables being accessible by the computer;
logic means executable by the computer for establishing build partitions in the main memory and the disk when the build table size is greater than the main memory size, the logic means having;
partition means for determining a number “
N”
of build partitions B1, B2, . . . ,BN of the build table;
hash means for applying a hash function to the build table to establish build partitions B1, B2, . . . ,BN;
write means for writing at least portions of the build partitions B1, B2, . . . ,BN into the main memory;
spill means for selecting at least one build partition as a victim partition when the main memory is full prior to writing each partition in its entirety to main memory and for spilling the victim partition to the disk; and
packing means for designating, after all build partitions B1, B2, . . . ,BN have been written to main memory or spilled to disk and before a probe phase is undertaken, at least some partitions as winners to be written from disk to main memory, and designating at least some partitions as losers to be written from main memory to disk.
1 Assignment
0 Petitions
Accused Products
Abstract
A system and method for joining a build table to a probe table in response to a query for data includes over partitioning the build table into “N” build partitions using a uniform hash function and writing the build partitions into main memory of a database computer. When the main memory becomes full, one or more partitions is selected as a victim partition to be written to disk storage, and the process continues until all build table rows or tuples have either been written into main memory or spilled to disk. Then, a packing algorithm is used to initially designate never-spilled partitions as “winners” and spilled partitions as “losers”, and then to randomly select one or more winners for prospective swapping with one or more losers. The I/O savings associated with each prospective swap is determined and if any savings would be realized, the winners are designated as losers the losers are designated as winners. The swap determination can be made multiple times, e.g., 256, after which losers are moved entirely to disk and winners are moved entirely to memory. At the end of the swapping, probe table rows associated with winner partitions are joined to rows in the winner build partitions while probe table rows associated with loser partitions are spilled to disk. Then, the loser build partitions are written to main memory for joining with corresponding probe table partitions, to undertake the requested join of the build table and probe table in an I/O- and memory-efficient manner.
35 Citations
40 Claims
-
1. A computer system, comprising:
-
at least one computer having at least one main memory defining a memory size, the computer accessing at least one data storage disk;
one or more input devices associated with the computer for generating a user query for data stored in at least one probe table and at least one build table defining a build table size, the tables being accessible by the computer;
logic means executable by the computer for establishing build partitions in the main memory and the disk when the build table size is greater than the main memory size, the logic means having;
partition means for determining a number “
N”
of build partitions B1, B2, . . . ,BN of the build table;
hash means for applying a hash function to the build table to establish build partitions B1, B2, . . . ,BN;
write means for writing at least portions of the build partitions B1, B2, . . . ,BN into the main memory;
spill means for selecting at least one build partition as a victim partition when the main memory is full prior to writing each partition in its entirety to main memory and for spilling the victim partition to the disk; and
packing means for designating, after all build partitions B1, B2, . . . ,BN have been written to main memory or spilled to disk and before a probe phase is undertaken, at least some partitions as winners to be written from disk to main memory, and designating at least some partitions as losers to be written from main memory to disk. - View Dependent Claims (2, 3, 4, 5, 6, 7, 8, 9, 10)
first means for determining whether the main memory contains more than “
T”
blocks of at least one previously spilled partition, and if so, designating at least a largest one of any such previously spilled partitions as a victim partition.
-
-
7. The computer system of claim 6, wherein the spill means further comprises:
second means for determining, when the first means fails to designate any victim partitions, whether the main memory contains more than “
T”
blocks of at least one previously unspilled partition, and if so, designating at least a largest one of any such previously unspilled partitions as a victim partition.
-
8. The computer system of claim 7, wherein the spill means further comprises:
means for designating as a victim partition, when the second means for determining fails to designate any victim partitions, a largest partition in main memory with less than “
T”
blocks in main memory.
-
9. The computer system of claim 1, wherein the packing means further includes:
-
initialization means for initializing at least some build partitions as winners and at least some build partitions as losers;
prospective swap means for randomly selecting one or more test winners and one or more test losers for a prospective swap, wherein test winners and test losers would exchange designations; and
swap determining means for determining an efficiency savings of the prospective swap and designating test winners as losers and test losers as winners based on the efficiency savings.
-
-
10. The computer system of claim 9, wherein the initialization means initializes never-spilled partitions as winners and spilled partitions as losers, and the packing means further includes means for swapping a single winner for more than one loser and swapping a single loser for more than one winner.
-
11. A computer-implemented method for efficiently joining a probe table to a build table in response to a database query for information, comprising:
-
establishing plural winner partitions of the build table in a computer main memory, prior to joining the probe table with the build table;
establishing one or more loser partitions of the build table on a computer disk; and
substantially optimizing a mix of winner partitions and loser partitions prior to joining the tables, including swapping at least one single winner partition for at least one loser partition and vice-versa;
overpartitioning the build table into plural build partitions, more than one of which is written in a computer main memory prior to the probe phase and at least one of which is at least partially written to at least one computer disk when the main memory becomes substantially full;
proposing one or more prospective swaps, wherein one or more partitions in main memory is a candidate for removal to the disk pursuant to the swap and one or more partitions on the disk is a candidate for removal to the main memory pursuant to the swap; and
executing the swap if an efficiency is realized thereby. - View Dependent Claims (12, 13, 14, 15, 16, 17, 18, 19, 20)
determining a number “
N”
of build partitions B1, B2, . . . ,BN of the build table;
applying a uniform hash function to the build table to establish the build partitions B1, B2, . . . ,BN;
writing at least portions of the build partitions B1, B2, . . . ,BN into the main memory;
selecting at least one build partition as a victim partition when the main memory is full prior to writing each partition in its entirety to main memory;
spilling the victim partition to the disk; and
after all build partitions B1, B2, . . . ,BN have been written to main memory or spilled to disk and before a probe phase is undertaken, designating at least some partitions as winners to be written from disk to main memory, and designating at least some partitions as losers to be written from main memory to disk.
-
-
14. The method of claim 13, further comprising accessing a spill number “
- K” and
determining the number “
N”
of build partitions based at least in part on a product of the spill number “
K” and
a main memory overhead number “
SpillOverhead”
associated with staging, in main memory, at least one victim partition during the probe phase.
- K” and
-
15. The method of claim 14, wherein the step of selecting at least one build partition as a victim partition further comprises:
determining whether the main memory contains more than “
T”
blocks of at least one previously spilled partition, and if so, designating at least a largest one of any such previously spilled partitions as a victim partition.
-
16. The method of claim 15, wherein the step of selecting at least one build partition as a victim partition further comprises:
when the main memory does not contain more than “
T”
blocks of a previously spilled partition, determining whether the main memory contains more than “
T”
blocks of at least one previously unspilled partition, and if so, designating at least a largest one of any such previously unspilled partitions as a victim partition.
-
17. The method of claim 16, wherein the step of selecting at least one build partition as a victim partition further comprises:
when the main memory does not contain more than “
T”
blocks of a previously unspilled partition, designating as a victim partition a largest partition in main memory with less than “
T”
blocks in main memory.
-
18. The method of claim 17, wherein the step of designating at least some partitions as winners and at least some partitions as losers further includes:
-
initializing never-spilled build partitions as winners and initializing spilled build partitions as losers;
randomly selecting one or more test winners and one or more test losers for a prospective swap, wherein test winners and test losers would exchange designations; and
determining an efficiency savings of the prospective swap and designating test winners as losers and test losers as winners based on the efficiency savings.
-
-
19. The method of claim 18, further comprising:
-
writing winners from disk to main memory; and
writing losers from main memory to disk.
-
-
20. The method of claim 13, wherein after the victim partition is spilled to disk, blocks of the victim partition subsequently processed during the step of applying a uniform hash function to the build table are written to main memory unless and until the victim partition is selected as a victim partition a second time.
-
21. A computer program device comprising:
-
a computer program storage device readable by a digital processing apparatus; and
a program means on the program storage device and including instructions executable by the digital processing apparatus for performing method steps for joining a probe table to a build table in a probe phase in response to a database query for information, the method steps comprising;
overpartitioning the build table into plural build partitions, more than one of which is written in a computer main memory prior to the probe phase and at least one of which is at least partially written to at least one computer disk when the main memory becomes substantially full;
proposing one or more prospective swaps, wherein one or more partitions in main memory is a candidate for removal to the disk pursuant to the swap and one or more partitions on the disk is a candidate for removal to the main memory pursuant to the swap; and
executing the swap if an efficiency is realized thereby. - View Dependent Claims (22, 23, 24, 25, 26, 27, 28)
determining a number “
N”
of build partitions B1, B2, . . . ,BN of the build table;
applying a uniform hash function to the build table to establish the build partitions B1, B2, . . . ,BN; and
writing at least portions of the build partitions B1, B2, . . . ,BN into the main memory.
-
-
23. The computer program device of claim 22, wherein the method steps further comprise:
-
selecting at least one build partition as a victim partition when the main memory is full prior to writing each partition in its entirety to main memory; and
spilling the victim partition to the disk.
-
-
24. The computer program device of claim 23, wherein the proposing step further includes:
designating at least some partitions as winners to be written from disk to main memory, and designating at least some partitions as losers to be written from main memory to disk, wherein a single winner partition can be swapped for more than one loser partition and a single loser partition can be swapped for more than one winner partition.
-
25. The computer program device of claim 24, wherein the step of determining a number “
- N”
of build partitions further comprises accessing a spill number “
K” and
determining the number “
N”
of build partitions based at least in part on a product of the spill number “
K” and
a main memory overhead number “
SpillOverhead”
associated with staging, in main memory, at least one victim partition during the probe phase.
- N”
-
26. The computer program device of claim 23, wherein the step of selecting at least one build partition as a victim partition further comprises:
-
determining whether the main memory contains more than “
T”
blocks of at least one previously spilled partition, and if so, designating at least a largest one of any such previously spilled partitions as a victim partition;
when the main memory does not contain more than “
T”
blocks of a previously spilled partition, determining whether the main memory contains more than “
T”
blocks of at least one previously unspilled partition, and if so, designating at least a largest one of any such previously unspilled partitions as a victim partition; and
when the main memory does not contain more than “
T”
blocks of a previously unspilled partition, designating as a victim partition a largest partition in main memory with less than “
T”
blocks in main memory.
-
-
27. The computer program device of claim 24, wherein the step of designating at least some partitions as winners and at least some partitions as losers further includes:
-
initializing never-spilled build partitions as winners and initializing spilled build partitions as losers;
randomly selecting one or more test winners and one or more test losers for a prospective swap, wherein test winners and test losers would exchange designations; and
determining an efficiency savings of the prospective swap and designating test winners as losers and test losers as winners based on the efficiency savings.
-
-
28. The computer program device of claim 23, wherein after the victim partition is spilled to disk, blocks of the victim partition subsequently processed during the step of applying a uniform hash function to the build table are written to main memory unless and until the victim partition is selected as a victim partition a second time.
-
29. A computer system, comprising:
-
at least one computer having at least one main memory defining a memory size, the computer accessing at least one data storage disk;
one or more input devices associated with the computer for generating a user query for data stored in at least a probe table and a build table defining a build table size, the tables being accessible by the computer;
logic means executable by the computer for establishing build partitions in the main memory and the disk when the build table size is greater than the main memory size, the logic means undertaking;
establishing plural winner partitions of the build table in the main memory, prior to joining the probe table with the build table;
establishing one or more loser partitions of the build table in the disk;
optimizing a mix of winner partitions and loser partitions prior to joining the tables; and
accessing a spill number “
K” and
determining the number “
N”
of build partitions of the build table based at least in part on a product of the spill number “
K” and
a main memory overhead number “
SpillOverhead”
associated with staging, in main memory, at least one victim partition during the probe phase.- View Dependent Claims (30, 31, 32, 33)
applying a uniform hash function to the build table to establish build partitions B1, B2, . . . ,BN;
writing at least portions of the build partitions B1, B2, . . . ,BN into the main memory;
selecting at least one build partition as a victim partition when the main memory is full prior to writing each partition in its entirety to main memory;
spilling the victim partition to the disk; and
after all build partitions B1, B2, . . . ,BN have been written to main memory or spilled to disk and before a probe phase is undertaken, designating at least some partitions as winners to be written from disk to main memory, and designating at least some partitions as losers to be written from main memory to disk.
-
-
31. The computer system of claim 30, wherein the step of selecting at least one build partition as a victim partition undertaken by the logic means further comprises:
-
determining whether the main memory contains more than “
T”
blocks of at least one previously spilled partition, and if so, designating at least a largest one of any such previously spilled partitions as a victim partition;
when the main memory does not contain more than “
T”
blocks of a previously spilled partition, determining whether the main memory contains more than “
T”
blocks of at least one previously unspilled partition, and if so, designating at least a largest one of any such previously unspilled partitions as a victim partition; and
when the main memory does not contain more than “
T”
blocks of a previously unspilled partition, designating as a victim partition a largest partition in main memory with less than “
T”
blocks in main memory.
-
-
32. The computer system of claim 30, wherein the step of designating at least some partitions as winners and at least some partitions as losers undertaken by the logic means further includes:
-
initializing never-spilled build partitions as winners and initializing spilled build partitions as losers;
for one or more iterations, randomly selecting one or more test winners and one or more test losers for a prospective swap, wherein test winners and test losers would exchange designations; and
for each iteration, determining an efficiency savings of the prospective swap and designating test winners as losers and test losers as winners based on the efficiency savings;
after all iterations, writing losers from main memory to disk and writing winners from disk to main memory, wherein a single winner partition can be swapped for more than one loser partition and a single loser partition can be swapped for more than one winner partition.
-
-
33. The computer system of claim 30, wherein after the victim partition is spilled to disk, blocks of the victim partition subsequently processed during the step of applying a uniform hash function to the build table are written to main memory unless and until the victim partition is selected as a victim partition a second time.
-
34. A computer system, comprising:
-
at least one computer having at least one main memory defining a memory size, the computer having at least one data storage disk;
one or more input devices associated with the computer for generating a user query for data stored in at least a probe table and a build table defining a build table size, the tables being accessible by the computer;
logic means executable by the computer for establishing build partitions in the main memory and the disk when the build table size is greater than the main memory size, the logic means undertaking;
overpartitioning the build table into plural build partitions, more than one of which is written in the computer main memory prior to a probe phase and at least one of which is at least partially written to at least one computer disk when the main memory becomes substantially full;
proposing one or more prospective swaps, wherein one or more partitions in main memory is a candidate for removal to the disk pursuant to the swap and one or more partitions on the disk is a candidate for removal to the main memory pursuant to the swap; and
executing the swap if an efficiency is realized thereby. - View Dependent Claims (35, 36, 37, 38, 39, 40)
accessing a spill number “
K” and
determining the number “
N”
of build partitions B1, B2, . . . ,BN based at least in part on a product of the spill number “
K” and
a main memory overhead number “
SpillOverhead”
associated with staging, in main memory, at least one victim partition during the probe phase;
applying a uniform hash function to the build table to establish the build partitions B1, B2, . . . ,BN; and
writing at least portions of the build partitions B1, B2, . . . ,BN into the main memory.
-
-
36. The computer system of claim 35, wherein the method steps undertaken by the logic means further comprise:
-
selecting at least one build partition as a victim partition when the main memory is full prior to writing each partition in its entirety to main memory; and
spilling the victim partition to the disk.
-
-
37. The computer system of claim 36, wherein the proposing step undertaken by the logic means further includes:
designating at least some partitions as winners to be written from disk to main memory, and designating at least some partitions as losers to be written from main memory to disk.
-
38. The computer system of claim 37, wherein the step of selecting at least one build partition as a victim partition undertaken by the logic means further comprises:
-
determining whether the main memory contains more than “
T”
blocks of at least one previously spilled partition, and if so, designating at least a largest one of any such previously spilled partitions as a victim partition;
when the main memory does not contain more than “
T”
blocks of a previously spilled partition, determining whether the main memory contains more than “
T”
blocks of at least one previously unspilled partition, and if so, designating at least a largest one of any such previously unspilled partitions as a victim partition; and
when the main memory does not contain more than “
T”
blocks of a previously unspilled partition, designating as a victim partition a largest partition in main memory with less than “
T”
blocks in main memory.
-
-
39. The computer system of claim 38, wherein the step of designating at least some partitions as winners and at least some partitions as losers further includes:
-
initializing never-spilled build partitions as winners and initializing spilled build partitions as losers;
randomly selecting one or more test winners and one or more test losers for a prospective swap, wherein test winners and test losers would exchange designations; and
determining an efficiency savings of the prospective swap and designating test winners as losers and test losers as winners based on the efficiency savings, wherein a single winner partition can be swapped for more than one loser partition and a single loser partition can be swapped for more than one winner partition.
-
-
40. The computer system of claim 36, wherein after the victim partition is spilled to disk, blocks of the victim partition, subsequently processed during the step of applying a uniform hash function to the build table are written to main memory unless and until the victim partition is selected as a victim partition a second time.
Specification