Compact aggregation working areas for efficient grouping and aggregation using multi-core CPUs

US 8,782,102 B2
Filed: 09/24/2010
Issued: 07/15/2014
Est. Priority Date: 09/24/2010
Status: Expired due to Fees

First Claim

Patent Images

1. A computer program product comprising a non-transitory computer useable storage medium to store a computer readable program, wherein the computer readable program, when executed on a computer, causes the computer to perform operations comprising:

computing a running aggregate for a group within a business intelligence (BI) query;

identifying a location to store running aggregate information within an aggregation working area of a cache, wherein the aggregation working area comprises;

a first data structure for storing running aggregate information that is associated with a group that is accessed frequently relative to a threshold; and

a second data structure for storing running aggregate information that is associated with a group that is accessed infrequently relative to the threshold;

estimating a final value of the running aggregate;

estimating a number of bits for storing the final value of the running aggregate;

allocating the estimated number of bits within either the first or second data structure; and

storing the running aggregate information in either the first or second data structure of the aggregation working area based on a characterization of the group as a frequently or infrequently accessed group.

View all claims

1 Assignment

Timeline View

Assignment View

0 Petitions

Accused Products

Abstract

A system is described for creating compact aggregation working areas for efficient grouping and aggregation using multi-core CPUs. The system implements operations including computing a running aggregate for a group within a business intelligence (BI) query, and identifying a location to store running aggregate information within an aggregation working area of a cache. The aggregation working area includes first and second data structures. The first data structure stores running aggregate information that is associated with a group that is accessed frequently relative to a threshold. The second data structure stores running aggregate information that is associated with a group that is accessed infrequently relative to the threshold. The operations also include storing the running aggregate information in either the first or second data structure of the aggregation working area based on a characterization of the group as a frequently or infrequently accessed group.

8 Citations

View as Search Results

20 Claims

1. A computer program product comprising a non-transitory computer useable storage medium to store a computer readable program, wherein the computer readable program, when executed on a computer, causes the computer to perform operations comprising:
- computing a running aggregate for a group within a business intelligence (BI) query;
  
  identifying a location to store running aggregate information within an aggregation working area of a cache, wherein the aggregation working area comprises;
  
  a first data structure for storing running aggregate information that is associated with a group that is accessed frequently relative to a threshold; and
  
  a second data structure for storing running aggregate information that is associated with a group that is accessed infrequently relative to the threshold;
  
  estimating a final value of the running aggregate;
  
  estimating a number of bits for storing the final value of the running aggregate;
  
  allocating the estimated number of bits within either the first or second data structure; and
  
  storing the running aggregate information in either the first or second data structure of the aggregation working area based on a characterization of the group as a frequently or infrequently accessed group.
- View Dependent Claims (2, 3, 4, 5, 6, 7, 8)
- - 2. The computer program product of claim 1, wherein:
    - the running aggregate comprises a value calculated during an aggregation operation; and
      
      the running aggregate information comprises information related to the running aggregate.
  - 3. The computer program product of claim 1, wherein execution of the computer readable program causes the computer to perform further operations comprising incrementally maintaining the running aggregate during a series of aggregation operations, wherein the aggregation operations comprise:
    - extracting a current value of the running aggregate from the allocated bits at the first location;
      
      casting the current value of the running aggregate to a standard data type of business data for which the running aggregate is computed;
      
      updating the current value of the running aggregate to reflect further aggregation with new business data;
      
      discarding extra bits from the updated value of the running aggregate; and
      
      storing the updated value of the running aggregate to the allocated bits at the first location.
  - 4. The computer program product of claim 3, wherein execution of the computer readable program causes the computer to perform further operations comprising handling an overflow condition of the allocated bits at the first location, wherein handling the overflow condition comprises:
    - detecting the overflow condition in response to a determination that the updated value of the running aggregate requires more bits than the allocated bits at the first location;
      
      storing the current value, prior to updating, of the running aggregate in another location of another data structure separate from the first and second data structures;
      
      resetting the current value of the running aggregate in the first data structure to zero; and
      
      storing an incremental value of the running aggregate in the first data structure.
  - 5. The computer program product of claim 1, wherein, in response to a determination that the group is accessed infrequently relative to the threshold, the running aggregate information comprises a location identifier within the second data structure.
  - 6. The computer program product of claim 5, wherein the location identifier within the second data structure comprises a group identifier.
  - 7. The computer program product of claim 1, wherein a total size of location identifiers associated with a group in the second data structure is smaller than a size of running aggregates associated with a group in the first data structure.
  - 8. The computer program product of claim 1, wherein the threshold comprises a tuple threshold, and the frequently accessed group has a first number of tuples touching the group that is greater than the tuple threshold, and the infrequently accessed group has a second number of tuples touching the group that is less than or equal to the tuple threshold.

9. A computer-implemented method for implementing an aggregation working area, the method comprising:
- computing a running aggregate for a group within a business intelligence (BI) query;
  
  identifying a location to store running aggregate information within the aggregation working area of a cache, wherein the aggregation working area comprises;
  
  a first data structure for storing running aggregate information that is associated with a group that is accessed frequently relative to a threshold; and
  
  a second data structure for storing running aggregate information that is associated with a group that is accessed infrequently relative to the threshold;
  
  wherein the threshold comprises a tuple threshold, and the frequently accessed group has a first number of tuples touching the group that is greater than the tuple threshold, and the infrequently accessed group has a second number of tuples touching the group that is less than or equal to the tuple threshold; and
  
  storing the running aggregate information in either the first or second data structure of the aggregation working area based on a characterization of the group as a frequently or infrequently accessed group.
- View Dependent Claims (10, 11, 12, 13, 14)
- - 10. The computer-implemented method of claim 9, further comprising:
    - determining that the group is a frequently accessed group, wherein the running aggregate information comprises a value of the running aggregate;
      
      estimating a final value of the running aggregate;
      
      estimating a number of bits for storing the final value of the running aggregate; and
      
      allocating the estimated number of bits within the first data structure for storing the running aggregate, wherein the allocated bits for the running aggregate are located at a first location within the first data structure.
  - 11. The computer-implemented method of claim 9, further comprising:
    - tightly packing a plurality of running aggregates into the first data structure, wherein each of the plurality of running aggregates are located at adjacent locations using separately estimated numbers of bits, wherein each estimated number of bits is less than a standard number of bits for a working data type of the business data from which the corresponding running aggregate is computed.
  - 12. The computer-implemented method of claim 9, further comprising incrementally maintaining the running aggregate during a series of aggregation operations, wherein the aggregation operations comprise:
    - extracting a current value of the running aggregate from the allocated bits at the first location;
      
      casting the current value of the running aggregate to a standard data type of business data for which the running aggregate is computed;
      
      updating the current value of the running aggregate to reflect further aggregation with new business data;
      
      discarding extra bits from the updated value of the running aggregate; and
      
      storing the updated value of the running aggregate to the allocated bits at the first location.
  - 13. The computer-implemented method of claim 12, further comprising handling an overflow condition of the allocated bits at the first location, wherein handling the overflow condition comprises:
    - detecting the overflow condition in response to a determination that the updated value of the running aggregate requires more bits than the allocated bits at the first location;
      
      storing the current value, prior to updating, of the running aggregate in another location of another data structure separate from the first and second data structures;
      
      resetting the current value of the running aggregate in the first data structure to zero; and
      
      storing an incremental value of the running aggregate in the first data structure.
  - 14. The computer-implemented method of claim 9, further comprising:
    - determining that the group is an infrequently accessed group; and
      
      storing a row identifier as the running aggregate information in the second data structure of the aggregation working area.

15. A system comprising:
- a multi-core processor, wherein each core is configured to run at least one thread;
  
  a cache within a caching hierarchy coupled to the multi-core processor, wherein the cache is configured to implement an aggregation working area for each thread, and each aggregation working area is configured to store running aggregate information associated with an aggregation function of a business intelligence (BI) platform;
  
  wherein the caching hierarchy is configured to implement separate data structures for aggregates of frequently and infrequently accessed groups within each aggregation working area; and
  
  wherein the caching hierarchy is further configured to implement a first data structure for storing running aggregate information that is associated with a group that is accessed frequently relative to a threshold, wherein the running aggregate information for the frequently accessed group comprises a value of a running aggregate computed according to the aggregation function.
- View Dependent Claims (16, 17, 18)
- - 16. The system of claim 15, wherein the caching hierarchy is further configured to store different types of running aggregate information for the frequently and infrequently accessed groups.
  - 17. The system of claim 15, wherein the caching hierarchy is further configured to implement a second data structure for storing running aggregate information that is associated with a group that is accessed infrequently relative to the threshold, wherein the running aggregate information for the infrequently accessed group comprises a location identifier to indicate a location of corresponding business data within the BI platform.
  - 18. The system of claim 15, wherein the caching hierarchy is further configured to store the running aggregate information for the frequently and infrequently accessed groups in compressed formats that are smaller in bit number than a working data type of business data stored within the BI platform from which running aggregates are computed.

19. A system comprising:
- a multi-core processor, wherein each core is configured to run at least one thread; and
  
  a cache within a caching hierarchy coupled to the multi-core processor, wherein the cache is configured to implement an aggregation working area for each thread, and each aggregation working area is configured to store running aggregate information associated with an aggregation function of a business intelligence (BI) platform, wherein the running aggregate information comprises information related to a running aggregate which comprises a value calculated according to the aggregation function;
  
  wherein the caching hierarchy is configured to implement separate a first data structure for aggregates of frequently accessed groups and a second data structure for aggregates of infrequently accessed groups, and to store a plurality of running aggregates into the first data structure for storing the aggregates associated with a group that is frequently accessed, wherein each of the plurality of running aggregates are located at adjacent locations using separately estimated numbers of bits, wherein each estimated number of bits is less than a standard number of bits for a working data type of the business data from which the corresponding running aggregate is computed.
- View Dependent Claims (20)
- - 20. The system of claim 19, wherein the caching hierarchy is further configured to implement:
    - a first data structure for storing running aggregate information that is associated with a group that is accessed frequently relative to a threshold, wherein the running aggregate information for the frequently accessed group comprises a value of a running aggregate computed according to the aggregation function and stored in a compressed format that is smaller in bit number than a standard number of bits for a working data type of the business data from which the corresponding running aggregate is computed; and
      
      a second data structure for storing running aggregate information that is associated with a group that is accessed infrequently relative to the threshold, wherein the running aggregate information for the infrequently accessed group comprises a location identifier to indicate a location of corresponding business data within the BI platform, wherein the location identifier is smaller in bit number than the standard number of bits for the working data type of the business data from which the corresponding running aggregate is computed.

Specification

Resources

Litigation Campaign Assessment

Current Assignee
International Business Machines Corporation
Original Assignee
International Business Machines Corporation
Inventors
Qiao, Lin, Raman, Vijayshankar, Reiss, Frederick R
Primary Examiner(s)
PHILLIPS, III, ALBERT M

Application Number

US12/889,789
Publication Number

US 20120078980A1
Time in Patent Office

1,390 Days
Field of Search

707/812
US Class Current

707/812
CPC Class Codes

G06F 16/24556 Aggregation; Duplicate elim...

G06F 16/24561 Intermediate data storage t...

Compact aggregation working areas for efficient grouping and aggregation using multi-core CPUs

First Claim

1 Assignment

0 Petitions

Accused Products

Abstract

8 Citations

20 Claims

Specification

Solutions

Use Cases

Quick Links

Compact aggregation working areas for efficient grouping and aggregation using multi-core CPUs

First Claim

1 Assignment

Subscription Required

Subscription Required

0 Petitions

Subscription Required

Accused Products

Subscription Required

Abstract

8 Citations

20 Claims

Specification

Subscription Required

Solutions

Use Cases

Quick Links