Autonomic recommendation and placement of materialized query tables for load distribution
First Claim
1. A computer-implemented method of placing materialized query tables (MQTs) in a distributed database system for improving load distribution and reducing network latency, said method comprising:
- inputting, by a user, to a materialized query table advisor (MQTA) of said distributed databases system,data placement destinations for MQTs in a frontend database and backend databases,a workload comprising a database update log and a read workload, anda simulated catalog of backend databases;
deriving, by said MQTA, candidate MQTs based on common parts of query statements;
calculating, by said MQTA, a total benefit for each instance of said candidate MQTs of all read queries of said workload in terms of resource time by comparing an estimated query processing time with and without said candidate MQTs, and a total overhead for refreshing said each candidate MQT for all write queries in said database update log in terms of said resource time;
deriving, by said MQTA, dependencies among said candidate MQTs, wherein a dependency indicates how multiple candidate MQTs are co-used in a single query statement;
deriving, by said MQTA, a total benefit for each dependency in terms of said resource time;
outputting, by said MQTA, to a data placement advisor (DPA), a query statement, an MQT identification, and a total benefit, corresponding to said workload;
inputting, by said user, to said DPA, database sizes allocated for said candidate MQTs in said frontend database and said backend databases,wherein a user specifies a space limit of said database sizes for any of said distributed database system, said frontend database, and said backend databases;
measuring, by said DPA, a synchronization cost for each of said candidate MQTs at said frontend database and said backend database, in terms of said resource time at said frontend to determine a total synchronization cost;
performing, by said DPA, a what-if data placement analysis comprising;
creating a ranked list of said candidate MQTs, based on return on investment (ROI), wherein said ROI is determined by dividing a net benefit, equal to said total benefit minus said total overhead minus said total synchronization cost, of each of said candidate MQTs by each of said candidate MOT'"'"'s size;
for each said dependency, creating a virtual caching unit (VCU), determining said ROI for each said VCU, and inserting said VCU in said ranked list; and
selecting from said ranked list said candidate MQT or VCU having a highest ROI and fitting said space limit, removing said selected MQT or VCU from said ranked list, and inserting said selected MQT or VCU into a recommended MQT list; and
re-calculating ROIs of said ranked list, based on subsumption of a candidate MQT by a VCU, and selecting from said ranked list said candidate MQT or VCU having a highest ROI and fitting said space limit, and inserting said selected MQT or VCU into said recommended MQT list, until said space limit is exceeded; and
using, by a data placement manager, said recommended MQT list, to reduce network latency in said distributed database system, by creating MQTs at said backend databases from said recommended MQT list;
creating frontend MQTs, based on created backend MQTs; and
synchronizing said created frontend MQTs with said created backend MQTs.
5 Assignments
0 Petitions
Accused Products
Abstract
A system and method of evaluating queries in distributed databases with MQTs comprises deriving MQTs; replicating the derived MQTs from a local server to at least one remote server; and distributing data and replicated derived MQTs to a plurality of other remote servers, wherein the distributing increases overall query execution efficiency. The databases may comprise heterogeneous databases. The query execution efficiency comprises observed response time at a frontend database and associated costs comprising computational central processing unit costs, input/output costs, and network communication costs. All of the associated costs comprise statistically estimated costs. The method further comprises running a MQT advisor at a frontend database, and considering the costs of at least one MQT placed at the frontend database. The method further comprises running a MQT advisor at a non-frontend database. Additionally, the increased overall query execution efficiency may consider all dependencies of all involved database instances and associated costs.
47 Citations
21 Claims
-
1. A computer-implemented method of placing materialized query tables (MQTs) in a distributed database system for improving load distribution and reducing network latency, said method comprising:
-
inputting, by a user, to a materialized query table advisor (MQTA) of said distributed databases system, data placement destinations for MQTs in a frontend database and backend databases, a workload comprising a database update log and a read workload, and a simulated catalog of backend databases; deriving, by said MQTA, candidate MQTs based on common parts of query statements; calculating, by said MQTA, a total benefit for each instance of said candidate MQTs of all read queries of said workload in terms of resource time by comparing an estimated query processing time with and without said candidate MQTs, and a total overhead for refreshing said each candidate MQT for all write queries in said database update log in terms of said resource time; deriving, by said MQTA, dependencies among said candidate MQTs, wherein a dependency indicates how multiple candidate MQTs are co-used in a single query statement; deriving, by said MQTA, a total benefit for each dependency in terms of said resource time; outputting, by said MQTA, to a data placement advisor (DPA), a query statement, an MQT identification, and a total benefit, corresponding to said workload; inputting, by said user, to said DPA, database sizes allocated for said candidate MQTs in said frontend database and said backend databases, wherein a user specifies a space limit of said database sizes for any of said distributed database system, said frontend database, and said backend databases; measuring, by said DPA, a synchronization cost for each of said candidate MQTs at said frontend database and said backend database, in terms of said resource time at said frontend to determine a total synchronization cost; performing, by said DPA, a what-if data placement analysis comprising; creating a ranked list of said candidate MQTs, based on return on investment (ROI), wherein said ROI is determined by dividing a net benefit, equal to said total benefit minus said total overhead minus said total synchronization cost, of each of said candidate MQTs by each of said candidate MOT'"'"'s size; for each said dependency, creating a virtual caching unit (VCU), determining said ROI for each said VCU, and inserting said VCU in said ranked list; and selecting from said ranked list said candidate MQT or VCU having a highest ROI and fitting said space limit, removing said selected MQT or VCU from said ranked list, and inserting said selected MQT or VCU into a recommended MQT list; and re-calculating ROIs of said ranked list, based on subsumption of a candidate MQT by a VCU, and selecting from said ranked list said candidate MQT or VCU having a highest ROI and fitting said space limit, and inserting said selected MQT or VCU into said recommended MQT list, until said space limit is exceeded; and using, by a data placement manager, said recommended MQT list, to reduce network latency in said distributed database system, by creating MQTs at said backend databases from said recommended MQT list;
creating frontend MQTs, based on created backend MQTs; and
synchronizing said created frontend MQTs with said created backend MQTs. - View Dependent Claims (2, 3, 4, 5, 6, 7)
-
-
8. A computer program storage medium readable by computer, tangibly embodying a program of instructions executable by said computer to perform a method of placing materialized query tables (MQTs) in a distributed database system by improving load distribution and reducing network latency, said method comprising:
-
inputting data placement destinations for MQTs in a frontend database and backend databases, a workload comprising a database update log and a read workload, and a simulated catalog of backend databases; deriving candidate MQTs based on common parts of query statements; calculating a total benefit for each instance of said candidate MQTs of all read queries of said workload in terms of resource time by comparing an estimated query processing time with and without said candidate MQTs, and a total overhead for refreshing said each candidate MQT for all write queries in said database update log in terms of said resource time; deriving dependencies among said candidate MQTs, wherein a dependency indicates how multiple candidate MQTs are co-used in a single query statement; deriving a total benefit for each dependency in terms of said resource time; outputting a query statement, an MQT identification, and a total benefit, corresponding to said workload; inputting database sizes allocated for said candidate MQTs in said frontend database and said backend databases, wherein a user specifies a space limit of said database sizes for any of said distributed database system, said frontend database, and said backend databases; measuring a synchronization cost for each of said candidate MQTs at said frontend database and said backend database, in terms of said resource time at said frontend to determine a total synchronization cost; performing a what-if data placement analysis comprising; creating a ranked list of said candidate MQTs, based on return on investment (ROI), wherein said ROI is determined by dividing a net benefit, equal to said total benefit minus said total overhead minus said total synchronization cost, of each of said candidate MQTs by each of said candidate MQT'"'"'s size; for each said dependency, creating a virtual caching unit (VCU), determining said ROI for each said VCU, and inserting said VCU in said ranked list; and selecting from said ranked list said candidate MQT or VCU having a highest ROI and fitting said space limit, removing said selected MQT or VCU from said ranked list, and inserting said selected MQT or VCU into a recommended MQT list; and re-calculating ROIs of said ranked list, based on subsumption of a candidate MQT by a VCU, and selecting from said ranked list said candidate MQT or VCU having a highest ROI and fitting said space limit, and inserting said selected MQT or VCU into said recommended MQT list, until said space limit is exceeded; and using said recommended MQT list, to reduce network latency in said distributed database system, by creating MQTs at said backend databases from said recommended MQT list;
creating frontend MQTs, based on created backend MQTs; and
synchronizing said created frontend MQTs with said created backend MQTs. - View Dependent Claims (9, 10, 11, 12, 13, 14)
-
-
15. A distributed database system for placing materialized query tables (MQTs) in said distributed database system by improving load distribution and reducing network latency, said distributed database system comprising:
-
a memory, connected to a materialized query table advisor (MQTA) of said distributed databases system, to which a user inputs; data placement destinations for MQTs in a frontend database and backend databases, a workload comprising a database update log and a read workload, and a simulated catalog of backend databases; and a processor configured to; derive, by said MQTA, candidate MQTs based on common parts of query statements; calculate, by said MQTA, a total benefit for each instance of said candidate MQTs of all read queries of said workload in terms of resource time by comparing an estimated query processing time with and without said candidate MQTs, and a total overhead for refreshing said each candidate MQT for all write queries in said database update log in terms of said resource time; derive, by said MQTA, dependencies among said candidate MQTs, wherein a dependency indicates how multiple candidate MQTs are co-used in a single query statement; derive, by said MQTA, a total benefit for each dependency in terms of said resource time; output, by said MQTA, to a data placement advisor (DPA), a query statement, an MQT identification, and a total benefit, corresponding to said workload; input, by said use to said DPA, database sizes allocated for said candidate MQTs in said frontend database and said backend databases, wherein a user specifies a space limit of said database sizes for any of said distributed database system, said frontend database, and said backend databases; measure, by said DPA, a synchronization cost for each of said candidate MQTs at said frontend database and said backend database, in terms of said resource time at said frontend to determine a total synchronization cost; perform, by said DPA, a what-if data placement analysis comprising; creating a ranked list of said candidate MQTs, based on return on investment (ROJ), wherein said ROI is determined by dividing a net benefit, equal to said total benefit minus said total overhead minus said total synchronization cost, of each of said candidate MQTs by each of said candidate MQT'"'"'s size; for each said dependency, creating a virtual caching unit (VCU), determining said ROI for each said VCU, and inserting said VCU in said ranked list; and selecting from said ranked list said candidate MQT or VCU having a highest ROI and fitting said space limit, removing said selected MQT or VCU from said ranked list, and inserting said selected MQT or VCU into a recommended MQT list; and re-calculating ROIs of said ranked list, based on subsumption of a candidate MQT by a VCU, and selecting from said ranked list said candidate MQT or VCU having a highest ROI and fitting said space limit, and inserting said selected MQT or VCU into said recommended MQT list, until said space limit is exceeded; and use, by a data placement manager, said recommended MQT list, to reduce network latency in said distributed database system, by creating MQTs at said backend databases from said recommended MQT list;
creating frontend MQTs, based on created backend MQTs; and
synchronizing said created frontend MQTs with said created backend MQTs. - View Dependent Claims (16, 17, 18, 19, 20, 21)
-
Specification