Method, apparatus and programmed medium for approximating the data cube and obtaining approximate answers to queries in relational databases

US 6,108,647 A
Filed: 05/21/1998
Issued: 08/22/2000
Est. Priority Date: 05/21/1998
Status: Expired due to Fees

First Claim

Patent Images

1. A computer implemented method for providing an approximate answer to a query in a stored relational database, the method comprising the steps of:

receiving said query containing input data to said stored relational database;

precomputing a summary of a data cube corresponding to said stored relational database by defining at least one sub-cube;

summarizing said at least one sub-cube using a plurality of histogram techniques;

computing an error/space benefit for each summary of said at least one sub-cube corresponding to each histogram technique;

determining a maximum of the error/space benefits, wherein said maximum error/space benefit corresponds to a summary with a minimum error;

calculating an approximate answer to said query based on said input data using the histogram technique corresponding to said maximum error/space benefit; and

outputting said approximate answer.

View all claims

3 Assignments

Timeline View

Assignment View

0 Petitions

Accused Products

Abstract

A novel and unique method of approximating the data cube and summarizing database data in order to provide quick and approximate answers to aggregate queries by precomputing a summary of the data cube using histograms and answering queries using the substantially smaller summary. A unique method according to the present invention provides for identifying accurate histogram classes and distributing the space among the histograms on various sub-cubes such that the errors are minimized, while at the same time computer resources are maximized.

170 Citations

30 Claims

1. A computer implemented method for providing an approximate answer to a query in a stored relational database, the method comprising the steps of:
- receiving said query containing input data to said stored relational database;
  
  precomputing a summary of a data cube corresponding to said stored relational database by defining at least one sub-cube;
  
  summarizing said at least one sub-cube using a plurality of histogram techniques;
  
  computing an error/space benefit for each summary of said at least one sub-cube corresponding to each histogram technique;
  
  determining a maximum of the error/space benefits, wherein said maximum error/space benefit corresponds to a summary with a minimum error;
  
  calculating an approximate answer to said query based on said input data using the histogram technique corresponding to said maximum error/space benefit; and
  
  outputting said approximate answer.
- View Dependent Claims (2, 3, 4, 5, 6, 7)
- - 2. The computer implemented method for providing an approximate answer to a query as in claim 1, wherein said query is an aggregate query.
  - 3. The computer implemented method for providing an approximate answer to a query as in claim 2, wherein said computing step further comprises:
    - allocating a predetermined number of data subsets to a histogram on said at least one sub-cube having a first data configuration;
      
      generating a second data configuration on said at least one sub-cube using said predetermined number of data subsets;
      
      calculating an error value for each data configuration;
      
      subtracting said error values; and
      
      updating said first data configuration based on said subtracted error values.
  - 4. The computer implemented method for providing an approximate answer to a query as in claim 2, wherein said histogram technique is a direct technique.
  - 5. The computer implemented method for providing an approximate answer to a query as in claim 2, wherein said histogram technique is a slicing technique.
  - 6. The computer implemented method for providing an approximate answer to a query as in claim 2, wherein said histogram technique is a merging technique.
  - 7. The computer implemented method for providing an approximate answer to a query as in claim 2, wherein said calculating step further comprises:
    - counting a number of data subsets corresponding to the histogram technique corresponding to said maximum error/space benefit;
      
      allocating a new data subset to said histogram technique corresponding to said maximum error/space benefit when said number of data subsets is greater than zero;
      
      decreasing said number of data subsets by one; and
      
      repeating said counting, allocating and decreasing steps until an optimum allocation of data subsets is achieved when said number of data subsets equals zero.

8. A computer program recorded on a computer-readable medium, said computer program designed to operate a computer to provide an approximate answer to a query in a stored relational database by executing the steps of:
- receiving said query containing input data to said stored relational database;
  
  precomputing a summary of a data cube corresponding to said stored relational database by defining at least one sub-cube;
  
  summarizing said at least one of sub-cube using a plurality of histogram techniques;
  
  computing an error/space benefit for each summary of said at least one sub-cube corresponding to each histogram technique;
  
  determining which histogram corresponds to a maximum of the error/space benefits, wherein said maximum of said error/space benefit corresponds to a summary with a minimum error;
  
  calculating an approximate answer to said query based on said input data using the histogram technique corresponding to said maximum error/space benefit; and
  
  outputting said approximate answer.
- View Dependent Claims (9, 10, 11, 12, 13, 14)
- - 9. The computer program recorded on a computer-readable medium as in claim 8, wherein said query is an aggregate query.
  - 10. The computer program recorded on a computer-readable medium as in claim 9 wherein said computing step further comprises:
    - allocating a predetermined number of data subsets to a histogram on said at least one sub-cube having a first data configuration;
      
      generating a second data configuration on said at least one sub-cube using said predetermined number of data subsets;
      
      calculating an error value for each data configuration;
      
      subtracting said error values; and
      
      updating said first data configuration based on said subtracted error values.
  - 11. The computer program recorded on a computer-readable medium as in claim 9 wherein said histogram technique is a direct technique.
  - 12. The computer program recorded on a computer-readable medium as in claim 9 wherein said histogram technique is a slicing technique.
  - 13. The computer program recorded on a computer-readable medium as in claim 9 wherein said histogram technique is a merging technique.
  - 14. The computer program recorded on a computer-readable medium as in claim 9 wherein said approximate answer is calculated by further executing the steps of:
    - counting a number of data subsets corresponding to the histogram technique corresponding to said maximum error/space benefit;
      
      allocating a new data subset to said histogram technique corresponding to said maximum error/space benefit when said number of data subsets is greater than zero;
      
      decreasing said number of data subsets by one; and
      
      repeating said counting, allocating and decreasing steps until an optimum allocation of data subsets is achieved when said number of data subsets equals zero.

15. A computer program recorded on a computer-readable medium, said computer program designed to operate a computer to provide an approximate answer to an aggregate query in a relational database, by executing the steps of:
- receiving a query;
  
  precomputing a summary of a data cube corresponding to a stored relational database by defining at least one sub-cube;
  
  summarizing said at least one of sub-cube using a plurality of histogram techniques;
  
  computing an error/space benefit for each summary of said at least one sub-cube corresponding to each histogram technique;
  
  calculating a maximum of the error/space benefits, wherein said maximum of said error/space benefit corresponds to a summary with a minimum error; and
  
  providing an approximate answer to said query using the histogram technique corresponding to said maximum error/space benefit.
- View Dependent Claims (16, 17)
- - 16. The computer program recorded on a computer-readable medium as in claim 15, wherein said query is an aggregate query.
  - 17. The computer program recorded on a computer-readable medium as in claim 15, wherein said approximate answer is calculated by further executing the steps of:
    - counting a number of data subsets corresponding to the histogram technique corresponding to said maximum error/space benefit;
      
      allocating a new data subset to said histogram technique corresponding to said maximum error/space benefit when said number of data subsets is greater than zero;
      
      decreasing said number of data subsets by one; and
      
      repeating said counting, allocating and decreasing steps until an optimum allocation of data subsets is achieved when said number of data subsets equals zero.

18. A system comprising a programmed computer and a stored relational database, wherein said computer is programmed to maximize a benefit for a data sub-cube in said stored relational database, by executing the steps of:
- allocating a predetermined number of data subsets to a histogram on said subcube having a first data configuration;
  
  generating a second data configuration on said sub-cube using said predetermined number of data subsets;
  
  calculating an error value for each data configuration;
  
  subtracting said error values; and
  
  updating said first data configuration based on said subtracted error values, wherein said benefit is a minimum error.
- View Dependent Claims (19, 20, 21, 22, 23)
- - 19. The system comprising a programmed computer and a stored relational database as in claim 18, wherein said computing step further comprises:
    - allocating a predetermined number of data subsets to a histogram on said at least one sub-cube having a first data configuration;
      
      generating a second data configuration on said at least one sub-cube using said predetermined number of data subsets;
      
      calculating an error value for each data configuration;
      
      subtracting said error values; and
      
      updating said first data configuration based on said subtracted error values.
  - 20. The system comprising a programmed computer and a stored relational database as in claim 18, wherein said histogram technique a direct technique.
  - 21. The system comprising a programmed computer and a stored relational database as in claim 18, wherein said histogram technique a slicing technique.
  - 22. The system comprising a programmed computer and a stored relational database as in claim 18, wherein said histogram technique a merging technique.
  - 23. The computer program recorded on a computer-readable medium as in claim 18 wherein said approximate answer is calculated by further executing the steps of:
    - counting a number of data subsets corresponding to the histogram technique corresponding to said maximum error/space benefit;
      
      allocating a new data subset to said histogram technique corresponding to said maximum error/space benefit when said number of data subsets is greater than zero;
      
      decreasing said number of data subsets by one; and
      
      repeating said counting, allocating and decreasing steps until an optimum allocation of data subsets is achieved when said number of data subsets equals zero.

24. A computer system for providing an approximate answer to a query in a stored relational database, the system comprising:
- means for receiving said query containing input data to said stored relational database;
  
  means for precomputing a summary of a data cube corresponding to said stored relational database by defining at least one sub-cube;
  
  means for summarizing said at least one sub-cube using a plurality of histogram techniques;
  
  means for computing an error/space benefit for each summary of said at least one sub-cube corresponding to each histogram technique;
  
  means for determining a maximum of the error/space benefits, wherein said maximum error/space benefit corresponds to a summary with a minimum error;
  
  means for calculating an approximate answer to said query based on said input data using the histogram technique corresponding to said maximum error/space benefit; and
  
  means for outputting said approximate answer.
- View Dependent Claims (25, 26, 27, 28, 29)
- - 25. The computer system for providing an approximate answer to a query as in claim 24, wherein said query is an aggregate query.
  - 26. The computer system for providing an approximate answer to a query as in claim 25, wherein said computing step further comprises:
    - means for allocating a predetermined number of data subsets to a histogram on said at least one sub-cube having a first data configuration;
      
      means for generating a second data configuration on said at least one sub-cube using said predetermined number of data subsets;
      
      means for calculating an error value for each data configuration;
      
      means for subtracting said error values; and
      
      means for updating said first data configuration based on said subtracted error values.
  - 27. The computer system for providing an approximate answer to a query as in claim 25, wherein said histogram technique is a direct technique.
  - 28. The computer system for providing an approximate answer to a query as in claim 25, wherein said histogram technique is a slicing technique.
  - 29. The computer system for providing an approximate answer to a query as in claim 25, wherein said histogram technique is a merging technique.

30. A computer implemented method for data subset allocation within a histogram comprising the steps of:
- counting a number of data subsets corresponding to a histogram which has a maximum error/space benefit, wherein said maximum of said error/space benefit corresponds to a histogram with a minimum error;
  
  allocating a new data subset to said histogram corresponding to said maximum error/space benefit when said number of data subsets is greater than zero;
  
  decreasing said number of data subsets by one;
  
  repeating said counting, allocating and decreasing steps until said number of data subsets equals zero; and
  
  using said allocation of data subsets to achieve an optimum allocation of data subsets in said histogram.

Specification

Resources

Litigation Campaign Assessment

Current Assignee
Lucent Technologies, Inc. (Nokia Corporation)
Original Assignee
Lucent Technologies, Inc. (Nokia Corporation)
Inventors
Ganti, Venkatesh, Poosala, Viswanath
Primary Examiner(s)
Breene, John E.
Assistant Examiner(s)
LEWIS, CHERYL RENEA

Application Number

US09/082,057
Time in Patent Office

824 Days
Field of Search

707/1, 707/2, 707/6, 709/201, 702/180, 705/400
US Class Current

1/1
CPC Class Codes

G06F 16/24539   using cached or materialise...

G06F 16/2462   Approximate or statistical ...

G06Q 30/0283   Price estimation or determi...

Y10S 707/99931   Database or file accessing

Y10S 707/99932   Access augmentation or opti...

Y10S 707/99936   Pattern matching access

Method, apparatus and programmed medium for approximating the data cube and obtaining approximate answers to queries in relational databases

First Claim

3 Assignments

0 Petitions

Accused Products

Abstract

170 Citations

30 Claims

Specification

Solutions

Use Cases

Quick Links

Method, apparatus and programmed medium for approximating the data cube and obtaining approximate answers to queries in relational databases

First Claim

3 Assignments

Subscription Required

Subscription Required

0 Petitions

Subscription Required

Accused Products

Subscription Required

Abstract

170 Citations

30 Claims

Specification

Subscription Required

Solutions

Use Cases

Quick Links