Systems and methods for optimizing database queries
First Claim
1. A computer-implemented method of optimizing execution of a query on a relational database having base tables with records for consumers representing members of a general population, said records including attributes related to the consumers included in the general population, the method comprising:
- receiving a query with an identified population of interest, the population of interest representing a number of consumers less than the general population, and the query not requiring execution on the general population;
determining whether the population of interest is below a threshold size with respect to the general population;
if the population of interest is below the threshold size;
generating abbreviated versions of the base tables, the abbreviated versions of the base tables having records for members of the general population that are within the population of interest, the abbreviated versions of the base tables comprising at least two tables selected from a group comprising;
a consumer information view, an aggregated view of accounts, an aggregated view of inquiries, and an aggregated view of public data;
generating an abbreviated virtual table that is customized to the population of interest and that includes records based on the records in the abbreviated versions of the base tables; and
performing the query on the customized, abbreviated virtual table; and
if the population of interest is above the threshold size, performing the query on a non-abbreviated version of the virtual table that is generated using non-abbreviated versions of the base tables.
1 Assignment
0 Petitions
Accused Products
Abstract
A relational database warehouse system with query optimization capabilities is described that allows for speedy identification of sets of records of interest from amongst tens of millions of records. The records of interest may include complex derived attributes, generated, at least in part, by aggregating data from a plurality of records in base data tables. In various embodiments, the query optimization capabilities allow the database warehouse system to identify conditions under which normal query execution may be replaced by one or more optimized execution methods, including, for example, eliminating unnecessary inner join operations on base data tables specified by a query, re-ordering the execution of group-by operations and left-outer join operations to greatly reduce the size of join tables produced while processing a query, and/or consolidating a set of segmentation queries for execution in one pass over the records of the database.
371 Citations
27 Claims
-
1. A computer-implemented method of optimizing execution of a query on a relational database having base tables with records for consumers representing members of a general population, said records including attributes related to the consumers included in the general population, the method comprising:
-
receiving a query with an identified population of interest, the population of interest representing a number of consumers less than the general population, and the query not requiring execution on the general population; determining whether the population of interest is below a threshold size with respect to the general population; if the population of interest is below the threshold size; generating abbreviated versions of the base tables, the abbreviated versions of the base tables having records for members of the general population that are within the population of interest, the abbreviated versions of the base tables comprising at least two tables selected from a group comprising;
a consumer information view, an aggregated view of accounts, an aggregated view of inquiries, and an aggregated view of public data;generating an abbreviated virtual table that is customized to the population of interest and that includes records based on the records in the abbreviated versions of the base tables; and performing the query on the customized, abbreviated virtual table; and if the population of interest is above the threshold size, performing the query on a non-abbreviated version of the virtual table that is generated using non-abbreviated versions of the base tables. - View Dependent Claims (2, 3, 4, 5, 6, 7)
-
-
8. A computer-readable medium comprising computer program code that configures a processor to perform a method of optimizing execution of a query on a relational database having base tables with records for consumers in a general population, said records including attributes related to the consumers in the general population, the medium comprising:
-
program code for receiving a query that includes a population of interest, the population of interest representing a number of consumers less than the general population, and the query not requiring execution on the general population; program code for determining whether the population of interest is below a threshold size with respect to the general population; program code that generates, if the population of interest is below the threshold size, abbreviated versions of the base tables, the abbreviated versions of the base tables including records for members of the general population that are within the population of interest, the abbreviated versions of the base tables comprising at least two tables selected from a group comprising;
a consumer information view, an aggregated view of accounts, an aggregated view of inquiries, and an aggregated view of public data;program code for generating, if the population of interest is below the threshold size, an abbreviated virtual table that is customized to the population of interest and that includes records based on the records in the abbreviated versions of the base tables; and program code for performing the query on the customized, abbreviated virtual table, if the population of interest is below the threshold size; and program code for performing the query on a non-abbreviated version of the virtual table that is generated using non-abbreviated versions of the base tables, if the population of interest is above the threshold size.
-
-
9. An apparatus for executing a query on a relational database comprising means configured to perform a method of optimizing execution of a query on a relational database having base tables with records for consumers in a general population, said records including attributes related to the consumers in the general population, the apparatus comprising:
-
means for receiving a query that includes a population of interest, the population of interest representing a number of consumers less than the general population, and the query not requiring execution on the general population; means for determining whether the population of interest is below a threshold size with respect to the general population; means for generating, if the population of interest is below the threshold size, an abbreviated version of the base tables, the abbreviated version of the base tables including records for members of the general population that are within the population of interest, the abbreviated version of the base tables comprising at least two elements selected from a group comprising;
a consumer information view, an aggregated view of accounts, an aggregated view of inquiries, and an aggregated view of public data;means for generating, if the population of interest is below the threshold size, an abbreviated virtual table that is customized to the population of interest and that includes records based on the records in the abbreviated version of the base tables; and means for performing the query on the customized, abbreviated virtual table, if the population of interest is below the threshold size; and means for performing the query on a non-abbreviated version of the virtual table that is generated using non-abbreviated versions of the base tables, if the population of interest is above the threshold size.
-
-
10. A computer-implemented method of optimizing execution of a query on a database, wherein the database comprises records stored in base tables, the method comprising:
-
receiving a query for the relational database, wherein the query pertains to a population of interest, the population of interest representing a number of consumers less than the general population, and the query not requiring execution on the general population; determining a sub-set of records in the base tables that are associated with the population of interest to the query; determining a size of the sub-set relative to a size of the database; when the size of the sub-set exceeds a threshold, executing the query on the database; and when the size of the sub-set is less than the threshold; generating abbreviated versions of the base tables that include those records that are associated with the population of interest, the abbreviated versions of the base tables comprising at least two elements selected from a group comprising;
consumer information data, a view of accounts data, a view of inquiries data, and a view of public data;populating a customized, abbreviated virtual table using the abbreviated versions of the base tables; and executing the query on the customized, abbreviated virtual table. - View Dependent Claims (11, 12, 13, 14, 15, 16, 17)
-
-
18. A data warehouse system configured to execute queries on a general population about whom data is arranged in a set of base tables of a database, the data warehouse system comprising:
-
storage modules configured to store the base tables about the general population; and a query manager configured to receive a query for the database, the query not requiring execution on the general population, the query manager further being configured to determine a sub-population of the general population, the sub-population being indicated as being of interest to the query, the query manager further configured to determine a size of the sub-population relative to the general population, and the query manager further configured to; execute the query on the database, when the size of the sub-population exceeds a threshold, or when the size of the sub-population is less than the threshold, generate abbreviated versions of the base tables that include those rows that pertain to the sub-population, populate a customized, abbreviated virtual table that is based on the abbreviated versions of the base tables, and execute the query on the customized, abbreviated virtual table; wherein the abbreviated versions of the base tables comprise at least two of the following;
consumer information, a view of accounts data, a view of inquiries data, and a view of public data. - View Dependent Claims (19, 20, 21, 22, 23, 24, 25, 26, 27)
-
Specification