Generating composite key relationships between database objects based on sampling
First Claim
1. A computer-implemented method of determining key relationships between database tables comprising:
- determining a sampling range for one or more matching columns between first and second database tables, wherein the matching columns satisfy one or more matching criteria and the sampling range is defined by minimum and maximum column values and is based on quantities of distinct values within the matching columns, and wherein determining a sampling range includes;
identifying a median value within a set of ordered column values for the matching columns; and
assigning consecutive column values from the ordered set less than the median value to the minimum column value and assigning consecutive column values from the ordered set greater than the median value to the maximum column value until the minimum and maximum column values of the sampling range produce a desired size for a sample set;
sampling data from the first and second database tables with values complying with the minimum and maximum column values of the sampling range for the one or more matching columns to determine the sample set; and
determining keys between the first and second database tables based on a comparison of matching between columns within the sample set and matching between columns of a full data set of the first and second database tables.
4 Assignments
0 Petitions
Accused Products
Abstract
According to one embodiment of the present invention, a system determines key relationships between database tables and includes a computer system including at least one processor. The system determines a sampling range for one or more matching columns between first and second database tables. The matching columns satisfy one or more matching criteria and the sampling range is based on quantities of distinct values within the matching columns. Data is sampled from the first and second database tables in accordance with the sampling ranges to determine a sample set. Keys between the first and second database tables are determined based on matching between columns within the sample set. Embodiments of the present invention further include a method and computer program product for determining key relationships between database tables in substantially the same manner described above.
-
Citations
12 Claims
-
1. A computer-implemented method of determining key relationships between database tables comprising:
-
determining a sampling range for one or more matching columns between first and second database tables, wherein the matching columns satisfy one or more matching criteria and the sampling range is defined by minimum and maximum column values and is based on quantities of distinct values within the matching columns, and wherein determining a sampling range includes; identifying a median value within a set of ordered column values for the matching columns; and assigning consecutive column values from the ordered set less than the median value to the minimum column value and assigning consecutive column values from the ordered set greater than the median value to the maximum column value until the minimum and maximum column values of the sampling range produce a desired size for a sample set; sampling data from the first and second database tables with values complying with the minimum and maximum column values of the sampling range for the one or more matching columns to determine the sample set; and determining keys between the first and second database tables based on a comparison of matching between columns within the sample set and matching between columns of a full data set of the first and second database tables. - View Dependent Claims (2, 3, 4)
-
-
5. A system for determining key relationships between database tables comprising:
a computer system including at least one processor configured to; determine a sampling range for one or more matching columns between first and second database tables, wherein the matching columns satisfy one or more matching criteria and the sampling range is defined by minimum and maximum column values and is based on quantities of distinct values within the matching columns, and wherein determining a sampling range includes; identifying a median value within a set of ordered column values for the matching columns; and assigning consecutive column values from the ordered set less than the median value to the minimum column value and assigning consecutive column values from the ordered set greater than the median value to the maximum column value until the minimum and maximum column values of the sampling range produce a desired size for a sample set; sample data from the first and second database tables with values complying with the minimum and maximum column values of the sampling range for the one or more matching columns to determine the sample set; and determine keys between the first and second database tables based on a comparison of matching between columns within the sample set and matching between columns of a full data set of the first and second database tables. - View Dependent Claims (6, 7, 8)
-
9. A computer program product for determining key relationships between database tables comprising:
a non-transitory computer readable storage medium having computer readable program code embodied therewith, the computer readable program code comprising computer readable program code configured to; determine a sampling range for one or more matching columns between first and second database tables, wherein the matching columns satisfy one or more matching criteria and the sampling range is defined by minimum and maximum column values and is based on quantities of distinct values within the matching columns, and wherein determining a sampling range includes; identifying a median value within a set of ordered column values for the matching columns; and assigning consecutive column values from the ordered set less than the median value to the minimum column value and assigning consecutive column values from the ordered set greater than the median value to the maximum column value until the minimum and maximum column values of the sampling range produce a desired size for a sample set; sample data from the first and second database tables with values complying with the minimum and maximum column values of the sampling range for the one or more matching columns to determine the sample set; and determine keys between the first and second database tables based on a comparison of matching between columns within the sample set and matching between columns of a full data set of the first and second database tables. - View Dependent Claims (10, 11, 12)
Specification