System and method for performing I/O-efficient join processing
First Claim
1. A method for performing a d-dimensional join between a first set and a second set of hyper-rectangles, at least one of said first set and said second set being stored in secondary memory, comprising the steps of:
- dividing the d-dimensional join into k d-dimensional join strips and a (d−
1)-and-a-half dimensional join;
classifying hyper-rectangles within said strips as large if they are contained in more than a single strip and small if they are contained within said single strip;
partitioning each large hyper-rectangle into one center piece and two end pieces;
recursively computing intersections between a first type from said first set and a second type from said second set, wherein each of said first type and said second type is one selected from a group consisting of said end pieces and said small hyper-rectangles;
computing intersections between center pieces from said first set and second set and with said small hyper-rectangles from said first set and said second set by partitioning said (d−
1)-and-a-half dimensional join along d−
1 dimensions and by processing through said steps of dividing, classifying, partitioning and recursively computing intersections for each of said d-2 dimensions; and
reporting to the secondary memory all computed intersections.
1 Assignment
0 Petitions
Accused Products
Abstract
I/O-efficient methods and apparatus are provided for the d-dimensional join problem in one, two, and three dimensions, and are also generalized for arbitrary higher dimensions. Let N be the total number of rectangles in the two sets to be joined, M the total amount of memory available, B the disk block size, and T the total number of pairs in the output of the join. Define n=N/B, m=M/B, and t=T/B. For one and two dimensions, I/O-optimal join methods are provided that run in O(nlogmn+t) I/O operations and have utility to temporal and spatial database systems. For dimensions d≧3, methods are provided that run in O(nlogm(d−1) n+t) I/O operations, which is within a logm(d−2)n factor of the currently known lower bounds.
28 Citations
23 Claims
-
1. A method for performing a d-dimensional join between a first set and a second set of hyper-rectangles, at least one of said first set and said second set being stored in secondary memory, comprising the steps of:
-
dividing the d-dimensional join into k d-dimensional join strips and a (d−
1)-and-a-half dimensional join;
classifying hyper-rectangles within said strips as large if they are contained in more than a single strip and small if they are contained within said single strip;
partitioning each large hyper-rectangle into one center piece and two end pieces;
recursively computing intersections between a first type from said first set and a second type from said second set, wherein each of said first type and said second type is one selected from a group consisting of said end pieces and said small hyper-rectangles;
computing intersections between center pieces from said first set and second set and with said small hyper-rectangles from said first set and said second set by partitioning said (d−
1)-and-a-half dimensional join along d−
1 dimensions and by processing through said steps of dividing, classifying, partitioning and recursively computing intersections for each of said d-2 dimensions; and
reporting to the secondary memory all computed intersections. - View Dependent Claims (2, 3, 4, 5, 6, 7, 8)
creating a first plurality of lists for containing hyper-rectangles from said first set and a second plurality of list for containing hyper-rectangles from said second set; and
populating one of said first plurality of lists and said second plurality of lists when a selected hyper-rectangle is a member of one of said first set and said second set and deleting all hyper-rectangles from a remaining one of said first set and said second set not intersecting said selected hyper-rectangle.
-
-
3. The method of claim 2, wherein said a single block B of each of said first plurality of lists and said second plurality of lists is kept in the main memory.
-
4. The method of claim 2, wherein said selected hyper-rectangle is added to a corresponding single block B after determining set membership and writing out said corresponding single block B when said single block B is full.
-
5. The method of claim 2, wherein non-intersections are determined by reading an entire list, scanning for intersections, deleting non-intersecting hyper-rectangles and writing out to the secondary memory all retained hyper-rectangles.
-
6. The method of claim 1, wherein all intersections are reported only once.
-
7. The method of claim 1, further including the steps of:
-
sorting said first set and said second set by their lower boundaries in a selected axis; and
outputting a combined list of hyper-rectangles.
-
-
8. The method of claim 7, wherein said step of computing intersections further includes the steps of:
-
sorting given hyper-rectangles by their lower boundaries in a selected axis to form a single list;
scanning said single list in order of increasing lower boundaries with respect to a given axis for every hyper-rectangle contained therein;
determining a set membership for a selected hyper-rectangle;
determining whether said selected hyper-rectangle is contained within a single strip;
inserting said selected hyper-rectangle into one of a plurality of lists corresponding to said set membership;
deleting all hyper-rectangles not contained in at least one list of a plurality of lists corresponding to a non-set membership; and
writing out to the secondary memory said selected hyper-rectangle and writing out to the secondary memory appropriate end pieces of said selected hyper-rectangle when said selected hyper-rectangle is contained within at least two strips.
-
-
9. An apparatus for performing a d-dimensional join of a first set and a second set of hyper-rectangles, comprising:
-
a secondary memory for storing at least one of said first set and said second set;
a processor having a main memory, said processor coupled to said secondary memory;
said processor being operable to divide the d-dimensional join into k d-dimensional join strips and a (d−
1)-and-a-half dimensional join;
said processor being further operable to classify hyper-rectangles within said strips as large if they are contained within more than a single strip and small if they are contained within said single strip;
said processor being further operable to partition each large hyper-rectangle into one center piece and two end pieces;
said processor being further operable to recursively compute intersections between a first type from said first set and a second type from said second set, wherein said first type and said second type is one selected from a group consisting of said end pieces and said small hyper-rectangles; and
said processor being further operable to compute intersections between center pieces from said first set and second set and with small hyper-rectangles from said first set and said second set by partitioning said (d−
1)-and-a-half dimensional join along d−
1 dimensions and by dividing, classifying, partitioning and recursively computing intersections for each of said d−
2 dimensions.- View Dependent Claims (10, 11, 12, 13, 14, 15, 16)
said processor is further operable to create a first plurality of lists for containing hyper-rectangles from said first set and a second plurality of list for containing hyper-rectangles from said second set; and
said processor is further operable to populate one of said first plurality of lists and said second plurality of lists when a selected hyper-rectangle is a member of one of said first set and said second set and deleting all hyper-rectangles from a remaining one of said first set and said second set not intersecting said selected hyper-rectangle.
-
-
11. The apparatus of claim 10, wherein a single block B of each of said first plurality of lists and said second plurality of lists is kept in said main memory.
-
12. The apparatus of claim 10, wherein said selected hyper-rectangle is added to a corresponding single block B after determining set membership and writing out said corresponding single block B to said secondary memory when said single block B is full.
-
13. The apparatus of claim 10, wherein said processor is further operable to determine non-intersections by reading an entire list, scanning for intersections, deleting non-intersecting hyper-rectangles and writing out to said secondary memory all retained hyper-rectangles.
-
14. The apparatus of claim 9, wherein said processor reports all intersections only once.
-
15. The apparatus of claim 9, wherein said processor is further operable to output a sorted combined list from said first set and said second set by the lower boundaries of hyper-rectangles contained therein.
-
16. The apparatus of claim 15, wherein:
-
said processor is further operable to sort given hyper-rectangles by their lower boundaries in a selected axis to form a single list;
said processor is further operable to scan said single list in order of increasing lower boundaries with respect to a given axis for every hyper-rectangle contained therein;
said processor is further operable to determine a set membership for a selected hyper-rectangle;
said processor is further operable to determine whether said selected hyper-rectangle is contained within a single strip;
said processor is further operable to insert said selected hyper-rectangle into one of a plurality of lists corresponding to said set membership;
said processor is further operable to delete all hyper-rectangles not contained in at least one list of a plurality of lists corresponding to a non-set membership; and
said processor is further operable to write out to said secondary memory said selected hyper-rectangle and write out to said secondary memory appropriate end pieces of said selected hyper-rectangle when said selected hyper-rectangle is contained within at least two strips.
-
-
17. A method for performing a two-dimensional join of a first set and a second set of rectangles, one of said first set and said second set being stored in secondary memory, comprising the steps of:
-
partitioning the two-dimensional join along a selected axis into k two dimensional join strips;
classifying rectangles within said strips as large if they are contained within more than a single strip and small if they are contained within said single strip;
partitioning each large rectangle into one center piece and two end pieces;
computing intersections between center pieces from said first set and second set and with said small rectangles from said first set and said second set;
recursively computing intersections between a first type from said first set and a second type from said second set, wherein said first type and said second type is one selected from a group consisting of said end pieces and said small rectangles; and
reporting to the secondary memory all computed intersections. - View Dependent Claims (18, 19, 20, 21, 22, 23)
creating a first plurality of lists for containing rectangles from said first set and a second plurality of list for containing rectangles from said second set; and
populating one of said first plurality of lists and said second plurality of lists when a selected rectangle is a member of one of said first set and said second set and deleting all rectangles from a remaining one of said first set and said second set not intersecting said selected rectangle.
-
-
19. The method of claim 18, wherein a single block B of each of said first plurality of lists and said second plurality of lists is kept in the main memory.
-
20. The method of claim 17, wherein all intersections are reported only once.
-
21. The method of claim 17, further including the step of sorting said first set and said second set by their lower boundaries in a given axis to form a single list of rectangles.
-
22. The method of claim 21, wherein said step of computing intersections further includes the steps of:
-
scanning said single list in order of increasing lower boundaries with respect to a given axis for every interval contained therein;
determining a set membership for a selected interval;
determining whether said selected interval is contained within a single strip;
inserting said selected interval into one of a plurality of lists corresponding to said set membership;
deleting all intervals not contained in at least one list of a plurality of lists corresponding to a non-set membership; and
writing out to the secondary memory said selected interval and writing out to the secondary memory appropriate end pieces of said selected interval when said selected interval is contained within at least two strips.
-
-
23. The method of claim 17, wherein said step of recursively computing further includes the step of tracking intervals whose end points extend beyond a current strip by storing them in a set of lists.
Specification