Merge join process
First Claim
1. In a computer system having at least one processor, a method of joining data rows from an outer table with data rows from an inner table when the inner table rows are indexed on a data column common to both tables, the method comprising the steps of:
- creating from the outer table a set of outer rows that satisfy a selection criteria;
ensuring the set is sorted on a join column, wherein the join column is the common data column; and
searching for a matching inner row for each outer row in sequence in the set until all the outer rows in the set have been searched, wherein initially a page-finder row is equated to the outer row first in set sequence, the searching for a matching inner row comprising the steps of;
identifying an inner table data page based on the value of the join column of the page-finder record;
retrieving the identified inner table data page;
searching the inner table data page for a first match row, wherein the first match row is an inner row with a join column value that matches the join column value of the page-finder row, the searching of the inner table data page for a first match row comprising the steps of;
discarding the page-finder row if no first match row is found; and
if the first match row is found, marking the first match row as a join row, assigning to a next key the value of the join column of the inner row that sequentially follows the first match row on the inner table data page, and assigning to a last key the largest join column value of an inner row stored on the inner table data page; and
if a first match is found on the inner table data page, searching the inner table data page until the value of the join column of an outer row is greater than the last key, the searching of the data page for a next match row comprising the steps of;
selecting an outer row that follows the page-finder row in set sequence;
discarding the outer row if the outer row has a join column value less than the next key;
if the join column value of the outer row is not less than the next key, searching the data page for a next match row, wherein the next match row is an inner row with a join column value that matches the join column value of the outer row; and
if a next match row is found, marking the next match row as a join row and assigning to the next key the value of the join column of the inner row that sequentially follows the next match row on the inner table data page; and
if the join column value of an outer row is greater than the last key, equating the page-finder row to the outer row with the join column value that is greater than the last key.
12 Assignments
0 Petitions
Accused Products
Abstract
A merge join process combines rows from an inner and an outer table when the inner table is indexed on a data column that is common to both tables. The merge join process creates a set of rows from the outer table that satisfy a selection criteria and sorts the rows in the set on the common data column if necessary. The merge join process searches for a matching inner row for each outer row in sequence using the inner table indices until it finds a matching inner row on a data page. The data page is then repeatedly searched for matches on the successive outer rows in the set until the end of the data page is reached. The end of the data page is reached when the value of the common data column in an outer row is greater than a last key that represents the highest value of the common data column in a inner row stored on the data page. The merge join process also determines that a matching inner row does not exist in the inner table when the value of the common data column in an outer row is less than a next key that represents the lowest value of the common data column that has not yet been searched for a match. Thus, the next and last keys reduce the number of scans of the data page. All matching inner rows on the data page are marked as join rows; all outer rows with common data column values between the next and last keys but which did not have matching inner rows are discarded. When the end of the data page is reached, another data page is located and the search is repeated. The subsequent data page is located using the inner table indices or by traversing links between data pages.
108 Citations
14 Claims
-
1. In a computer system having at least one processor, a method of joining data rows from an outer table with data rows from an inner table when the inner table rows are indexed on a data column common to both tables, the method comprising the steps of:
-
creating from the outer table a set of outer rows that satisfy a selection criteria;
ensuring the set is sorted on a join column, wherein the join column is the common data column; and
searching for a matching inner row for each outer row in sequence in the set until all the outer rows in the set have been searched, wherein initially a page-finder row is equated to the outer row first in set sequence, the searching for a matching inner row comprising the steps of;
identifying an inner table data page based on the value of the join column of the page-finder record;
retrieving the identified inner table data page;
searching the inner table data page for a first match row, wherein the first match row is an inner row with a join column value that matches the join column value of the page-finder row, the searching of the inner table data page for a first match row comprising the steps of;
discarding the page-finder row if no first match row is found; and
if the first match row is found, marking the first match row as a join row, assigning to a next key the value of the join column of the inner row that sequentially follows the first match row on the inner table data page, and assigning to a last key the largest join column value of an inner row stored on the inner table data page; and
if a first match is found on the inner table data page, searching the inner table data page until the value of the join column of an outer row is greater than the last key, the searching of the data page for a next match row comprising the steps of;
selecting an outer row that follows the page-finder row in set sequence;
discarding the outer row if the outer row has a join column value less than the next key;
if the join column value of the outer row is not less than the next key, searching the data page for a next match row, wherein the next match row is an inner row with a join column value that matches the join column value of the outer row; and
if a next match row is found, marking the next match row as a join row and assigning to the next key the value of the join column of the inner row that sequentially follows the next match row on the inner table data page; and
if the join column value of an outer row is greater than the last key, equating the page-finder row to the outer row with the join column value that is greater than the last key. - View Dependent Claims (2, 3, 4, 5, 6)
determining an inner table index record that encompasses the value of the join column of the page-finder row; and
searching the inner table index record to identify the inner table data page.
-
-
3. The method of claim 1, wherein the inner index table data pages are linked and the step of searching the inner table data page until the value of the join column of an outer row is greater than the last key further comprises the step of:
identifying a next inner table data page to search by traversing the links when all the inner rows on the data page currently being searched have been evaluated and the join column value of the outer row is equal to the last key on the data page currently being searched.
-
4. The method of claim 1, wherein the step of searching the data page for a next match row further comprises the step of skipping the outer row next in sequence if the join column value of the outer row next in sequence is identical to the join column value of the outer row immediately prior in sequence.
-
5. The method of claim 1, wherein the computer system has multiple processors and further comprising the steps of dividing the set of outer rows into subsets and assigning each subset to a different processor so that the step of searching for a matching inner row for each outer row is performed substantially in parallel by the multiple processors.
-
6. The method of claim 1, wherein the computer system has multiple processors and the step of creating from the outer table a set of outer rows that satisfy a selection criteria and ensuring the set is sorted on a join column are performed substantially in parallel by the multiple processors.
-
7. A computer-readable medium having computer-executable modules for joining data rows from two tables which have a common data column, where one table is an inner table stored in an data structure based on values in the common data column and the other table is an outer table, the computer-executable modules comprising:
-
a sort_outer_hit_records procedure for creating a set of outer rows from the outer table sorted on the common data column, wherein each outer row in the set satisfies a selection criteria, the sort_outer_hit_records procedure further for returning an outer row from the set in sequential order;
a relational storage manager procedure for searching the data structure and returning a data page of inner rows, wherein the data page returned is determined by the value of the common data column in one of the outer row returned by the sort_outer_hit_records procedure;
a fetch procedure for setting values for a next key and a last key for the data page returned by the relational storage manager procedure; and
a multi_fetch procedure for receiving the data page from the relational storage manager procedure and the outer row from the sort_outer_hit_records procedure, for determining whether to search the data page based on comparing the value of the common data column of the outer row with the next key and the last key, for performing a search of the data page for an inner row having a value in the common data column that matches the value in the common data column of the outer row, and for discarding an unmatched outer row or returning a matching inner row as a result of the search. - View Dependent Claims (8, 9, 10)
-
-
11. A merge join process for joining data rows from two tables which have a common data column, where one table is an inner table stored in an data structure based on values in the common data column and the other table is an outer table, the merge join process comprising:
-
sorting means for creating a set of outer rows from the outer table sorted on the common data column, wherein each outer row in the set satisfies a selection criteria, the sorting means further for returning an outer row from the set in sequential order;
data storage means for selecting a data page of inner rows based on the value of the common data column in one of the sorted outer rows;
fetch means for setting values for a next key and a last key for the selected data page; and
searching means for determining whether to search the selected data page by comparing the value of the common data column of the outer row with the next key and the last key, for performing a search of the selected data page for an inner row that matches the outer row on the common data column, and for discarding an unmatched outer row or returning a matching inner row as a result of the search. - View Dependent Claims (12, 13, 14)
-
Specification