Partial pre-aggregation in relational database queries

US 7,133,858 B1
Filed: 06/30/2000
Issued: 11/07/2006
Est. Priority Date: 06/30/2000
Status: Expired due to Fees

First Claim

Patent Images

1. A method, implemented by a computing device, for processing a database query, comprising:

partially pre-aggregating records in a database to provide a result that contains at least two records having like grouping column values;

aggregating records derived from the result that contains at least two records having like grouping column values to provide a result that contains records having unique grouping column values; and

partially pre-aggregating the records in the database only if an estimation, based on a calculation of a probability that a record will be absorbed by a group of records already in memory, indicates that a number of records in the result that contains at least two records having like grouping column values is significantly less than a number of records in the database, wherein the estimation is based on factors comprising;

a number of output records, T(N);

a number of input records, N; and

a relationship
T(N)=M+(N−

M)(1−

A(R(M)))=M+(N−

M)Σ

_i=1^D(1−

p_i)^R(M);

wherein M records can fit into memory.

View all claims

3 Assignments

Timeline View

Assignment View

0 Petitions

Accused Products

Abstract

A partial pre-aggregation database operation improves processing efficiency of database queries by reducing the number of records input into a subsequent database operation, provided the query includes a final aggregation. A query optimizer is provided to determine when it is economical to partially pre-aggregate data records and when it is not. The partial pre-aggregation creates a record store in memory as input records are received. The record store is then used by another database operator, which saves the other database operator from having to re-create the record store.

20 Citations

View as Search Results

26 Claims

1. A method, implemented by a computing device, for processing a database query, comprising:
- partially pre-aggregating records in a database to provide a result that contains at least two records having like grouping column values;
  
  aggregating records derived from the result that contains at least two records having like grouping column values to provide a result that contains records having unique grouping column values; and
  
  partially pre-aggregating the records in the database only if an estimation, based on a calculation of a probability that a record will be absorbed by a group of records already in memory, indicates that a number of records in the result that contains at least two records having like grouping column values is significantly less than a number of records in the database, wherein the estimation is based on factors comprising;
  
  a number of output records, T(N);
  
  a number of input records, N; and
  
  a relationship
  T(N)=M+(N−
  
  M)(1−
  
  A(R(M)))=M+(N−
  
  M)Σ
  
  _i=1^D(1−
  
  p_i)^R(M);
  
  wherein M records can fit into memory.
- View Dependent Claims (2, 3, 4, 5, 6, 7, 8, 9, 12, 13, 14, 15, 16)
- - 2. The method as recited in claim 1, wherein the partially pre-aggregating further comprises:
    - maintaining a record store in memory, the record store having one record for each different grouping column value encountered in the operation;
      
      receiving a new record;
      
      combining the new record with a record having the same grouping column value, if such a record exists; and
      
      adding the new record to the record store in the memory if there is no record in the record store that has the same grouping column value as the new record.
  - 3. The method as recited in claim 2, further comprising:
    - adding additional new records to the record store until the record store reaches a capacity such that it can accept no new records; and
      
      outputting one or more records from the record store to a subsequent database operator.
  - 4. The method as recited in claim 3, wherein after the one or more records have been output to the subsequent database operator, the adding and outputting are repeated until there are no new records to process.
  - 5. The method as recited in claim 4, wherein any records remaining in the record store after there are no new records to process are output to the subsequent database operator.
  - 6. The method as recited in claim 3, wherein the subsequent database operator is a join.
  - 7. The method as recited in claim 1, wherein the partially pre-aggregating includes utilizing a hashing function.
  - 8. The method as recited in claim 1, wherein the partial pre-aggregating creates a record store in memory, and wherein the method further comprises utilizing the record store in memory for one or more other database operators.
  - 9. A computer programmed to perform the method recited in claim 1.
  - 12. The method of claim 1, wherein the estimation is based, in part, on an estimated absorption rate by which records are absorbed by records in memory.
  - 13. The method of claim 12, wherein the absorption rate is estimated, in part, based on a number of records expected to be processed.
  - 14. The method of claim 13, wherein the number of records expected to be processed is estimated, in part, based on a number of records that will fit in memory.
  - 15. The method of claim 1, wherein the number of input records, N, is known.
  - 16. The method of claim 1, wherein the number of input records, N, is estimated.

10. A relational database computer program stored on a computer-readable medium, the relational database computer program comprising computer-executable instructions that, when executed on a computer, perform steps comprising:
- receiving a stream of input records;
  
  partially pre-aggregating the input records according to a single grouping column to provide a result that contains at least two records having like grouping column values, wherein the partially pre-aggregating the input records is performed if an estimation, based on a calculation of a probability that a record will be absorbed by a group of records already in memory, indicates that a number of records in the result that contains at least two records having like grouping column values is significantly less than a number of records in the stream of input records, wherein the estimation is based on factors comprising;
  
  a number of output records, T(N);
  
  a number of input records, N; and
  
  a relationship;
  
  T(N)=M+(N−
  
  M)(1−
  
  A(R(M)))=M+(N−
  
  M)Σ
  
  _i=1^D(1−
  
  p_i)^R(M);
  
  wherein M records can fit into memory;
  
  joining the partially pre-aggregated records with other data to create a record store; and
  
  aggregating records within the record store to provide a result that contains records having unique grouping column values.
- View Dependent Claims (11, 17, 18, 19, 20, 21, 22, 23, 24, 25, 26)
- - 11. The relational database computer program as recited in claim 10, wherein:
    - the record store has a capacity that is less than the number of records in the stream of input records; and
      
      the aggregating each input record is performed until the record store reaches capacity.
  - 17. The relational database as recited in claim 10, wherein the partially pre-aggregating further comprises:
    - maintaining a record store in memory, the record store having one record for each different grouping column value encountered in the operation;
      
      receiving a new record;
      
      combining the new record with a record having the same grouping column value, if such a record exists; and
      
      adding the new record to the record store in the memory if there is no record in the record store that has the same grouping column value as the new record.
  - 18. The relational database as recited in claim 17, wherein the steps further comprise:
    - adding additional new records to the record store until the record store reaches a capacity such that it can accept no new records; and
      
      outputting one or more records from the record store to a subsequent database operator.
  - 19. The relational database as recited in claim 18, wherein any records remaining in the record store after there are no new records to process are output to the subsequent database operator.
  - 20. The relational database as recited in claim 10, wherein the partially pre-aggregating includes utilizing a hashing function.
  - 21. The relational database as recited in claim 10, wherein the partial pre-aggregating creates a record store in memory, and wherein operation of the relational database further comprises steps utilizing the record store in memory for one or more other database operators.
  - 22. The relational database as recited in claim 10, wherein the estimation is based, in part, on an estimated absorption rate by which records are absorbed by records in memory.
  - 23. The relational database as recited in claim 22, wherein the estimated absorption rate is estimated, in part, based on a number of records expected to be processed.
  - 24. The relational database as recited in claim 23, wherein the number of records expected to be processed is estimated, in part, based on a number of records that will fit in memory.
  - 25. The relational database as recited in claim 10, wherein the number of input records, N, is known.
  - 26. The relational database as recited in claim 10, wherein the number of input records, N, is estimated.

Specification

Resources

Litigation Campaign Assessment

Current Assignee
Microsoft Technology Licensing LLC (Microsoft Corporation)
Original Assignee
Microsoft Corporation
Inventors
Galindo-Legaria, Cesar A., Larson, Per-Ake
Primary Examiner(s)
Coby, Frantz
Assistant Examiner(s)
Nguyen, Cindy

Application Number

US09/608,395
Time in Patent Office

2,321 Days
Field of Search

707/3, 707/2, 707/1
US Class Current

1/1
CPC Class Codes

G06F 16/24537   of operators

G06F 16/24545   Selectivity estimation or d...

G06F 16/24556   Aggregation; Duplicate elim...

Y10S 707/99931   Database or file accessing

Y10S 707/99932   Access augmentation or opti...

Y10S 707/99933   Query processing, i.e. sear...

Y10S 707/99934   Query formulation, input pr...

Y10S 707/99935   Query augmenting and refini...

Partial pre-aggregation in relational database queries

First Claim

3 Assignments

0 Petitions

Accused Products

Abstract

20 Citations

26 Claims

Specification

Solutions

Use Cases

Quick Links

Partial pre-aggregation in relational database queries

First Claim

3 Assignments

Subscription Required

Subscription Required

0 Petitions

Subscription Required

Accused Products

Subscription Required

Abstract

20 Citations

26 Claims

Specification

Subscription Required

Solutions

Use Cases

Quick Links