Autonomic recommendation and placement of materialized query tables for load distribution

US 7,689,538 B2
Filed: 01/26/2006
Issued: 03/30/2010
Est. Priority Date: 01/26/2006
Status: Active Grant

First Claim

Patent Images

1. A computer-implemented method of placing materialized query tables (MQTs) in a distributed database system for improving load distribution and reducing network latency, said method comprising:

inputting, by a user, to a materialized query table advisor (MQTA) of said distributed databases system,data placement destinations for MQTs in a frontend database and backend databases,a workload comprising a database update log and a read workload, anda simulated catalog of backend databases;

deriving, by said MQTA, candidate MQTs based on common parts of query statements;

calculating, by said MQTA, a total benefit for each instance of said candidate MQTs of all read queries of said workload in terms of resource time by comparing an estimated query processing time with and without said candidate MQTs, and a total overhead for refreshing said each candidate MQT for all write queries in said database update log in terms of said resource time;

deriving, by said MQTA, dependencies among said candidate MQTs, wherein a dependency indicates how multiple candidate MQTs are co-used in a single query statement;

deriving, by said MQTA, a total benefit for each dependency in terms of said resource time;

outputting, by said MQTA, to a data placement advisor (DPA), a query statement, an MQT identification, and a total benefit, corresponding to said workload;

inputting, by said user, to said DPA, database sizes allocated for said candidate MQTs in said frontend database and said backend databases,wherein a user specifies a space limit of said database sizes for any of said distributed database system, said frontend database, and said backend databases;

measuring, by said DPA, a synchronization cost for each of said candidate MQTs at said frontend database and said backend database, in terms of said resource time at said frontend to determine a total synchronization cost;

performing, by said DPA, a what-if data placement analysis comprising;

creating a ranked list of said candidate MQTs, based on return on investment (ROI), wherein said ROI is determined by dividing a net benefit, equal to said total benefit minus said total overhead minus said total synchronization cost, of each of said candidate MQTs by each of said candidate MOT'"'"'s size;

for each said dependency, creating a virtual caching unit (VCU), determining said ROI for each said VCU, and inserting said VCU in said ranked list; and

selecting from said ranked list said candidate MQT or VCU having a highest ROI and fitting said space limit, removing said selected MQT or VCU from said ranked list, and inserting said selected MQT or VCU into a recommended MQT list; and

re-calculating ROIs of said ranked list, based on subsumption of a candidate MQT by a VCU, and selecting from said ranked list said candidate MQT or VCU having a highest ROI and fitting said space limit, and inserting said selected MQT or VCU into said recommended MQT list, until said space limit is exceeded; and

using, by a data placement manager, said recommended MQT list, to reduce network latency in said distributed database system, by creating MQTs at said backend databases from said recommended MQT list;

creating frontend MQTs, based on created backend MQTs; and

synchronizing said created frontend MQTs with said created backend MQTs.

View all claims

5 Assignments

Timeline View

Assignment View

0 Petitions

Accused Products

Abstract

A system and method of evaluating queries in distributed databases with MQTs comprises deriving MQTs; replicating the derived MQTs from a local server to at least one remote server; and distributing data and replicated derived MQTs to a plurality of other remote servers, wherein the distributing increases overall query execution efficiency. The databases may comprise heterogeneous databases. The query execution efficiency comprises observed response time at a frontend database and associated costs comprising computational central processing unit costs, input/output costs, and network communication costs. All of the associated costs comprise statistically estimated costs. The method further comprises running a MQT advisor at a frontend database, and considering the costs of at least one MQT placed at the frontend database. The method further comprises running a MQT advisor at a non-frontend database. Additionally, the increased overall query execution efficiency may consider all dependencies of all involved database instances and associated costs.

47 Citations

View as Search Results

21 Claims

1. A computer-implemented method of placing materialized query tables (MQTs) in a distributed database system for improving load distribution and reducing network latency, said method comprising:
- inputting, by a user, to a materialized query table advisor (MQTA) of said distributed databases system,data placement destinations for MQTs in a frontend database and backend databases,a workload comprising a database update log and a read workload, anda simulated catalog of backend databases;
  
  deriving, by said MQTA, candidate MQTs based on common parts of query statements;
  
  calculating, by said MQTA, a total benefit for each instance of said candidate MQTs of all read queries of said workload in terms of resource time by comparing an estimated query processing time with and without said candidate MQTs, and a total overhead for refreshing said each candidate MQT for all write queries in said database update log in terms of said resource time;
  
  deriving, by said MQTA, dependencies among said candidate MQTs, wherein a dependency indicates how multiple candidate MQTs are co-used in a single query statement;
  
  deriving, by said MQTA, a total benefit for each dependency in terms of said resource time;
  
  outputting, by said MQTA, to a data placement advisor (DPA), a query statement, an MQT identification, and a total benefit, corresponding to said workload;
  
  inputting, by said user, to said DPA, database sizes allocated for said candidate MQTs in said frontend database and said backend databases,wherein a user specifies a space limit of said database sizes for any of said distributed database system, said frontend database, and said backend databases;
  
  measuring, by said DPA, a synchronization cost for each of said candidate MQTs at said frontend database and said backend database, in terms of said resource time at said frontend to determine a total synchronization cost;
  
  performing, by said DPA, a what-if data placement analysis comprising;
  
  creating a ranked list of said candidate MQTs, based on return on investment (ROI), wherein said ROI is determined by dividing a net benefit, equal to said total benefit minus said total overhead minus said total synchronization cost, of each of said candidate MQTs by each of said candidate MOT'"'"'s size;
  
  for each said dependency, creating a virtual caching unit (VCU), determining said ROI for each said VCU, and inserting said VCU in said ranked list; and
  
  selecting from said ranked list said candidate MQT or VCU having a highest ROI and fitting said space limit, removing said selected MQT or VCU from said ranked list, and inserting said selected MQT or VCU into a recommended MQT list; and
  
  re-calculating ROIs of said ranked list, based on subsumption of a candidate MQT by a VCU, and selecting from said ranked list said candidate MQT or VCU having a highest ROI and fitting said space limit, and inserting said selected MQT or VCU into said recommended MQT list, until said space limit is exceeded; and
  
  using, by a data placement manager, said recommended MQT list, to reduce network latency in said distributed database system, by creating MQTs at said backend databases from said recommended MQT list;
  
  creating frontend MQTs, based on created backend MQTs; and
  
  synchronizing said created frontend MQTs with said created backend MQTs.
- View Dependent Claims (2, 3, 4, 5, 6, 7)
- - 2. The method of claim 1, wherein said frontend and said backend comprise heterogeneous databases.
  - 3. The method of claim 1, wherein a query execution efficiency comprises observed response time at said frontend and associated costs comprising computational central processing unit costs, input/output costs, and network communication costs.
  - 4. The method of claim 3, wherein said associated costs comprise statistically estimated costs.
  - 5. The method of claim 4, further comprising running a MQT advisor at a frontend database, and considering said associated costs of at least one MQT placed at said frontend database.
  - 6. The method of claim 4, further comprising running said MQTA at a non-frontend database.
  - 7. The method of claim 3, wherein the query execution efficiency considers all dependencies of all involved database instances and associated costs.

8. A computer program storage medium readable by computer, tangibly embodying a program of instructions executable by said computer to perform a method of placing materialized query tables (MQTs) in a distributed database system by improving load distribution and reducing network latency, said method comprising:
- inputting data placement destinations for MQTs in a frontend database and backend databases, a workload comprising a database update log and a read workload, and a simulated catalog of backend databases;
  
  deriving candidate MQTs based on common parts of query statements;
  
  calculating a total benefit for each instance of said candidate MQTs of all read queries of said workload in terms of resource time by comparing an estimated query processing time with and without said candidate MQTs, and a total overhead for refreshing said each candidate MQT for all write queries in said database update log in terms of said resource time;
  
  deriving dependencies among said candidate MQTs, wherein a dependency indicates how multiple candidate MQTs are co-used in a single query statement;
  
  deriving a total benefit for each dependency in terms of said resource time;
  
  outputting a query statement, an MQT identification, and a total benefit, corresponding to said workload;
  
  inputting database sizes allocated for said candidate MQTs in said frontend database and said backend databases,wherein a user specifies a space limit of said database sizes for any of said distributed database system, said frontend database, and said backend databases;
  
  measuring a synchronization cost for each of said candidate MQTs at said frontend database and said backend database, in terms of said resource time at said frontend to determine a total synchronization cost;
  
  performing a what-if data placement analysis comprising;
  
  creating a ranked list of said candidate MQTs, based on return on investment (ROI), wherein said ROI is determined by dividing a net benefit, equal to said total benefit minus said total overhead minus said total synchronization cost, of each of said candidate MQTs by each of said candidate MQT'"'"'s size;
  
  for each said dependency, creating a virtual caching unit (VCU), determining said ROI for each said VCU, and inserting said VCU in said ranked list; and
  
  selecting from said ranked list said candidate MQT or VCU having a highest ROI and fitting said space limit, removing said selected MQT or VCU from said ranked list, and inserting said selected MQT or VCU into a recommended MQT list; and
  
  re-calculating ROIs of said ranked list, based on subsumption of a candidate MQT by a VCU, and selecting from said ranked list said candidate MQT or VCU having a highest ROI and fitting said space limit, and inserting said selected MQT or VCU into said recommended MQT list, until said space limit is exceeded; and
  
  using said recommended MQT list, to reduce network latency in said distributed database system, by creating MQTs at said backend databases from said recommended MQT list;
  
  creating frontend MQTs, based on created backend MQTs; and
  
  synchronizing said created frontend MQTs with said created backend MQTs.
- View Dependent Claims (9, 10, 11, 12, 13, 14)
- - 9. The computer program storage medium of claim 8, wherein said frontend and said backend comprise heterogeneous databases.
  - 10. The computer program storage medium of claim 8, wherein a query execution efficiency comprises observed response time at said frontend and associated costs comprising computational central processing unit costs, input/output costs, and network communication costs.
  - 11. The computer program storage medium of claim 10, wherein said associated costs comprise statistically estimated costs.
  - 12. The computer program storage medium of claim 11, wherein said method further comprises running a MQT advisor at a frontend database, and considering said associated costs of at least one MQT placed at said frontend database.
  - 13. The computer program storage medium of claim 11, wherein said method further comprises running said MQTA at a non-frontend database.
  - 14. The computer program storage medium of claim 10, wherein the query execution efficiency considers all dependencies of all involved database instances and associated costs.

15. A distributed database system for placing materialized query tables (MQTs) in said distributed database system by improving load distribution and reducing network latency, said distributed database system comprising:
- a memory, connected to a materialized query table advisor (MQTA) of said distributed databases system, to which a user inputs;
  
  data placement destinations for MQTs in a frontend database and backend databases,a workload comprising a database update log and a read workload, anda simulated catalog of backend databases; and
  
  a processor configured to;
  
  derive, by said MQTA, candidate MQTs based on common parts of query statements;
  
  calculate, by said MQTA, a total benefit for each instance of said candidate MQTs of all read queries of said workload in terms of resource time by comparing an estimated query processing time with and without said candidate MQTs, and a total overhead for refreshing said each candidate MQT for all write queries in said database update log in terms of said resource time;
  
  derive, by said MQTA, dependencies among said candidate MQTs, wherein a dependency indicates how multiple candidate MQTs are co-used in a single query statement;
  
  derive, by said MQTA, a total benefit for each dependency in terms of said resource time;
  
  output, by said MQTA, to a data placement advisor (DPA), a query statement, an MQT identification, and a total benefit, corresponding to said workload;
  
  input, by said use to said DPA, database sizes allocated for said candidate MQTs in said frontend database and said backend databases,wherein a user specifies a space limit of said database sizes for any of said distributed database system, said frontend database, and said backend databases;
  
  measure, by said DPA, a synchronization cost for each of said candidate MQTs at said frontend database and said backend database, in terms of said resource time at said frontend to determine a total synchronization cost;
  
  perform, by said DPA, a what-if data placement analysis comprising;
  
  creating a ranked list of said candidate MQTs, based on return on investment (ROJ), wherein said ROI is determined by dividing a net benefit, equal to said total benefit minus said total overhead minus said total synchronization cost, of each of said candidate MQTs by each of said candidate MQT'"'"'s size;
  
  for each said dependency, creating a virtual caching unit (VCU), determining said ROI for each said VCU, and inserting said VCU in said ranked list; and
  
  selecting from said ranked list said candidate MQT or VCU having a highest ROI and fitting said space limit, removing said selected MQT or VCU from said ranked list, and inserting said selected MQT or VCU into a recommended MQT list; and
  
  re-calculating ROIs of said ranked list, based on subsumption of a candidate MQT by a VCU, and selecting from said ranked list said candidate MQT or VCU having a highest ROI and fitting said space limit, and inserting said selected MQT or VCU into said recommended MQT list, until said space limit is exceeded; and
  
  use, by a data placement manager, said recommended MQT list, to reduce network latency in said distributed database system, by creating MQTs at said backend databases from said recommended MQT list;
  
  creating frontend MQTs, based on created backend MQTs; and
  
  synchronizing said created frontend MQTs with said created backend MQTs.
- View Dependent Claims (16, 17, 18, 19, 20, 21)
- - 16. The system of claim 15, wherein said frontend and said backend comprise heterogeneous database.
  - 17. The system of claim 15, wherein a query execution efficiency comprises observed response time at said frontend and associated costs comprising computational central processing unit costs, input/output costs, and network communication costs.
  - 18. The system of claim 17, wherein said associated costs comprise statistically estimated costs.
  - 19. The system of claim 18, further comprising means for running a MQT advisor at a frontend database, and considering said associated costs of at least one MQT placed at said frontend database.
  - 20. The system of claim 18, further comprising means wherein said method further comprises running said MQTA at a non-frontend database.
  - 21. The system of claim 17, wherein the query execution efficiency considers all dependencies of all involved database instances and associated costs.

Specification

Resources

Litigation Campaign Assessment

Current Assignee
X Corp. (f/k/a Twitter, Inc.) (X Holdings Corp.)
Original Assignee
International Business Machines Corporation
Inventors
Zilio, Daniele C., Li, Wen-Syan
Primary Examiner(s)
Trujillo; James
Assistant Examiner(s)
Somers; Marc

Application Number

US11/340,203
Publication Number

US 20070174292A1
Time in Patent Office

1,524 Days
Field of Search

707/4, 707/10, 707/5
US Class Current

1/1
CPC Class Codes

G06F 16/24539 using cached or materialise...

Y10S 707/99932 Access augmentation or opti...

Autonomic recommendation and placement of materialized query tables for load distribution

First Claim

5 Assignments

0 Petitions

Accused Products

Abstract

47 Citations

21 Claims

Specification

Solutions

Use Cases

Quick Links

Autonomic recommendation and placement of materialized query tables for load distribution

First Claim

5 Assignments

Subscription Required

Subscription Required

0 Petitions

Subscription Required

Accused Products

Subscription Required

Abstract

47 Citations

21 Claims

Specification

Subscription Required

Solutions

Use Cases

Quick Links