Automatic consistent sampling for data analysis
First Claim
1. A computer-implemented method of analyzing data within one or more databases, comprising:
- selecting one or more databases for analysis, each database comprising one or more database tables with one or more columns including one or more data values;
applying a function to each data value of one or more columns in the database tables within the selected one or more databases, wherein the function produces different function values for the data values limited to a predetermined range;
identifying for analysis the data values producing a same function value within the predetermined range to form a sampled data set;
analyzing the sampled data set by matching data values from columns of different database tables within the sampled data set to determine key relationships between columns of the database tables within and across the selected one or more databases, wherein the key relationships between the columns of the database tables are determined without key relationships between the columns of the database tables being known beforehand; and
retrieving data from a plurality of the database tables of the selected one or more databases based on the determined key relationships.
1 Assignment
0 Petitions
Accused Products
Abstract
A method, computer program product, and system for analyzing data within one or more databases, comprising selecting one or more databases for analysis, each database comprising one or more database objects comprising one or more data values, applying a function to each data value in each database object within the one or more databases, where the function produces function values limited to a predetermined range, identifying for analysis the data values producing a certain function value within the predetermined range to form a sampled data set, and analyzing the sampled data set to determine relationships between the database objects within and across the one or more databases.
36 Citations
17 Claims
-
1. A computer-implemented method of analyzing data within one or more databases, comprising:
-
selecting one or more databases for analysis, each database comprising one or more database tables with one or more columns including one or more data values; applying a function to each data value of one or more columns in the database tables within the selected one or more databases, wherein the function produces different function values for the data values limited to a predetermined range; identifying for analysis the data values producing a same function value within the predetermined range to form a sampled data set; analyzing the sampled data set by matching data values from columns of different database tables within the sampled data set to determine key relationships between columns of the database tables within and across the selected one or more databases, wherein the key relationships between the columns of the database tables are determined without key relationships between the columns of the database tables being known beforehand; and retrieving data from a plurality of the database tables of the selected one or more databases based on the determined key relationships. - View Dependent Claims (2, 3, 4, 5, 6)
-
-
7. A computer program product for analyzing data within one or more databases, comprising:
a computer readable storage medium having computer readable program code embodied therewith, the computer readable program code comprising computer readable program code configured to; select one or more databases for analysis, each database comprising one or more database tables with one or more columns including one or more data values; apply a function to each data value of one or more columns in the database tables within the selected one or more databases, wherein the function produces different function values for the data values limited to a predetermined range; identify for analysis the data values producing a same function value within the predetermined range to form a sampled data set; analyze the sampled data set by matching data values from columns of different database tables within the sampled data set to determine key relationships between columns of the database tables within and across the selected one or more databases, wherein the key relationships between the columns of the database tables are determined without key relationships between the columns of the database tables being known beforehand; and retrieve data from a plurality of the database tables of the selected one or more databases based on the determined key relationships. - View Dependent Claims (8, 9, 10, 11, 12)
-
13. A system for analyzing data within one or more databases, comprising:
-
one or more databases, each database comprising one or more database tables with one or more columns including one or more data values stored in memory; and a processor configured with logic to; select one or more databases from the one or more databases for analysis; apply a function to each data value of one or more columns in the database tables within the selected one or more databases, wherein the function produces different function values for the data values limited to a predetermined range; identify for analysis the data values producing a same function value within the predetermined range to form a sampled data set; analyze the sampled data set by matching data values from columns of different database tables within the sampled data set to determine key relationships between columns of the database tables within and across the selected one or more databases, wherein the key relationships between the columns of the database tables are determined without key relationships between the columns of the database tables being known beforehand; and retrieve data from a plurality of the database tables of the selected one or more databases based on the determined key relationships. - View Dependent Claims (14, 15, 16, 17)
-
Specification