Dynamically generating pre-aggregated datasets

US 8,521,774 B1
Filed: 08/20/2010
Issued: 08/27/2013
Est. Priority Date: 08/20/2010
Status: Active Grant

First Claim

Patent Images

1. A computer-implemented method performed by one or more data processing apparatus, comprising:

receiving a first data request from a user device, the first data request specifying a first type of aggregated data being requested and one or more first criteria for selecting data from a multidimensional dataset to derive a first value for the first type of aggregated data;

responding to the first data request, wherein the responding comprises;

deriving the first value for the first type of aggregated data using the data selected from the multidimensional dataset according to the one or more first criteria; and

determining a first latency for responding to the first data request using the multidimensional dataset, the first latency being indicative of an amount of time, after receipt of the first data request, that is required to derive the first value using the multidimensional dataset;

defining a pre-aggregated dataset based on the first data request, the pre-aggregated dataset, the pre-aggregated dataset being a proper subset of the multidimensional dataset used to derive the first value;

calculating a benefit score for the pre-aggregated dataset based on a difference between the first latency and a second latency that is indicative of an amount of time required to respond to the first data request using the pre-aggregated dataset;

determining that the benefit score meets a predetermined threshold value; and

storing the pre-aggregated dataset from the multidimensional dataset upon determining that the benefit score of the pre-aggregated dataset meets the predetermined threshold value.

View all claims

2 Assignments

Timeline View

Assignment View

0 Petitions

Accused Products

Abstract

Methods, systems, and apparatus, including computer programs encoded on a computer storage medium, for dynamically generating and configuring pre-aggregated datasets optimized for responding to particular types of data requests made against a large sub-optimal multidimensional dataset are disclosed. A dynamic aggregator monitors the query types and response latencies associated with queries made against the large multidimensional dataset. The dynamic aggregator defines pre-aggregated datasets based on the types of queries received from users and calculates a respective benefit score for each pre-aggregated dataset. The benefit score of each pre-aggregated dataset can be based on the recorded latencies and query count for the pre-aggregated dataset. The dynamic aggregator can decide whether to generate and/or maintain particular pre-aggregated datasets based on the current values of the benefit scores associated with the particular pre-aggregated datasets.

51 Citations

View as Search Results

18 Claims

1. A computer-implemented method performed by one or more data processing apparatus, comprising:
- receiving a first data request from a user device, the first data request specifying a first type of aggregated data being requested and one or more first criteria for selecting data from a multidimensional dataset to derive a first value for the first type of aggregated data;
  
  responding to the first data request, wherein the responding comprises;
  
  deriving the first value for the first type of aggregated data using the data selected from the multidimensional dataset according to the one or more first criteria; and
  
  determining a first latency for responding to the first data request using the multidimensional dataset, the first latency being indicative of an amount of time, after receipt of the first data request, that is required to derive the first value using the multidimensional dataset;
  
  defining a pre-aggregated dataset based on the first data request, the pre-aggregated dataset, the pre-aggregated dataset being a proper subset of the multidimensional dataset used to derive the first value;
  
  calculating a benefit score for the pre-aggregated dataset based on a difference between the first latency and a second latency that is indicative of an amount of time required to respond to the first data request using the pre-aggregated dataset;
  
  determining that the benefit score meets a predetermined threshold value; and
  
  storing the pre-aggregated dataset from the multidimensional dataset upon determining that the benefit score of the pre-aggregated dataset meets the predetermined threshold value.
- View Dependent Claims (2, 3, 4, 5, 6)
- - 2. The method of claim 1, further comprising:
    - receiving a second data request, the second data request specifying a type of aggregated data being requested and one or more criteria for selecting data from the multidimensional dataset to derive a value for the type of aggregated data by the second data request;
      
      determining that using the pre-aggregated dataset to derive the value for the type of aggregated data requested by the second data request lowers a latency for responding to the second data request relative to a latency for responding to the second data request using the multidimensional dataset to derive the value for the type of aggregated data requested by the second data request; and
      
      increasing the benefit score for the pre-aggregated dataset, in response to determining that using the pre-aggregated dataset to derive the value for the type of aggregated data requested by the second data request lowers the latency for responding to the second data request.
  - 3. The method of claim 2, wherein the determining that the benefit score meets a predetermined threshold value is based on the increased benefit score.
  - 4. The method of claim 1, further comprising:
    - after storing the pre-aggregated dataset, receiving a second data request, the second data request specifying a type of aggregated data being requested and one or more criteria for selecting data from the multidimensional dataset to derive a value for the type of aggregated data requested by the second data request;
      
      determining that using the pre-aggregated dataset to derive the value for the type of aggregated data requested by the second data request lowers a latency for responding to the second data request relative to a latency for responding to the second data request using the multidimensional dataset to derive the value for the type of aggregated data requested by the second data request;
      
      deriving the value for the type of aggregated data requested by the second data request using the pre-aggregated data from the pre-aggregated dataset; and
      
      providing the value derived from the pre-aggregated dataset to a requesting user device of the second data request.
  - 5. The method of claim 1, further comprising:
    - updating the pre-aggregated dataset using data newly added to the multidimensional dataset;
      
      determining an amount of resources required for storing and maintaining the pre-aggregated dataset based on resources used for updating the pre-aggregated dataset and storing the updated pre-aggregated dataset; and
      
      decreasing the benefit score for the pre-aggregated dataset based on the amount of resources required for storing and maintaining the pre-aggregated dataset.
  - 6. The method of claim 5, further comprising:
    - determining that the decreased benefit score is below the predetermined threshold value; and
      
      discarding the pre-aggregated dataset upon determining that the decreased benefit score is below the predetermined threshold value.

7. A computer-implemented method, comprising:
- monitoring data requests received from user devices, each data request specifying a respective type of aggregated data being requested and one or more respective criteria for selecting data from a multidimensional dataset to derive a respective value for the requested type of aggregated data;
  
  based on the received data requests, defining one or more optimized datasets, the optimized datasets being proper subsets of the multidimensional dataset wherein each optimized dataset is defined based on data required to derive the respective value for the type of aggregated data requested by at least one of the received data requests;
  
  upon receipt a data request;
  
  determining that a respective optimized dataset has not been stored for the type of aggregated data being requested by the data request;
  
  in response to determining that the respective optimized dataset has not been stored;
  
  deriving the respective value for the type of aggregated data requested by the data request, the value being derived based on data selected directly from the multidimensional dataset;
  
  determining a first latency, the first latency being indicative of an amount of time, after receipt of the data request, that is required to derive the respective value using the multidimensional dataset; and
  
  defining an optimized dataset based on the data request, the optimized dataset being a proper subset of the multidimensional dataset used to derive the respective value; and
  
  calculating a benefit score for the optimized dataset based on a difference between the first latency and a second latency that is indicative of an amount of time required to derive the respective value using the optimized dataset, and the respective request count maintained for the respective optimized dataset.
- View Dependent Claims (8)
- - 8. The method of claim 7, further comprising:
    - for each of one or more stored optimized datasets;
      
      updating the optimized dataset using data newly added to the multidimensional dataset;
      
      determining an amount of resources required for storing and for maintaining the optimized dataset based on resources used for updating the optimized dataset and storing the updated optimized dataset;
      
      decreasing the respective benefit score for the respective optimized dataset based on the amount of resources required for storing and for maintaining the optimized dataset;
      
      determining that the decreased benefit score is below a predetermined threshold; and
      
      discarding the respective optimized dataset upon determining that the respective benefit score of the optimized dataset is below the predetermined threshold.

9. A computer-implemented method, comprising:
- monitoring respective query types and first response times associated with queries made against a multidimensional dataset, each query being received from a user of the multidimensional dataset, a response time being a time interval between receiving a query and providing a response to the query;
  
  for each of one or more query types;
  
  determining, for one or more query types, that pre-aggregated response time resulting from using a pre-aggregated dataset to respond to the query type are lower than the first response time resulting from using the multidimensional dataset to respond to the query type, the pre-aggregated dataset being a proper subset of the multidimensional dataset;
  
  in response to determining that using the pre-aggregated response time is lower than the first response time, storing the pre-aggregated dataset for the query type, the pre-aggregated dataset including pre-aggregated values derived from the multidimensional dataset; and
  
  dynamically updating a respective benefit score for each pre-aggregated dataset based on a current count and recorded pre-aggregated response time associated with the queries that are of the associated query type of the pre-aggregated dataset.
- View Dependent Claims (10, 11, 12, 13, 14)
- - 10. The method of claim 9, further comprising defining one or more pre-aggregated datasets, wherein defining the one or more pre-aggregated datasets comprises:
    - recording data ranges and dimensions of the multidimensional dataset that were accessed during preparation of the respective response for each of the received queries;
      
      recording types of data aggregation performed during preparation of the respective response for each of the received queries;
      
      analyzing the recorded data ranges and dimensions that were accessed and the recorded types of data aggregations that were performed for each type of received queries; and
      
      preparing a respective definition for the pre-aggregated dataset for each type of received queries based on a result of the analyzing, wherein the pre-aggregated dataset prepared according to the definition including values derived using at least some of the recorded types of data aggregation and performed on at least some of the recorded data ranges and dimensions.
  - 11. The method of claim 9, wherein each query type specifies a type of performance metric calculated by aggregating data in one or more dimensions of the multidimensional dataset.
  - 12. The method of claim 9, wherein each query type specifies respective values or value ranges for one or more dimensions of the multidimensional dataset that are used to select data from the multidimensional dataset for calculating a type of performance metric.
  - 13. The method of claim 12, wherein the type of performance metric includes one of a count of a specified type of user interaction events, a financial value associated with a specified type of user interaction events, and a time value associated with a specified type of user interaction events.
  - 14. The method of claim 13, wherein the one or more dimensions includes one or more of an advertiser identifier, a user identifier, an ad campaign identifier, an ad group identifier, a keyword, a creative identifier, a delivery period, a conversion event, a click event, an impression event, a search event, a bid value, a conversion value, and a timestamp.

15. A computer storage device having instructions stored thereon, the instructions, when executed by one or more processors, cause the processors to perform operations comprising:
- receiving a first data request from a user device, the first data request specifying a first type of aggregated data being requested and one or more first criteria for selecting data from a multidimensional dataset to derive a first value for the first type of aggregated data;
  
  responding to the first data request, wherein the responding comprises;
  
  deriving the first value for the first type of aggregated data using the data selected from the multidimensional dataset according to the one or more first criteria; and
  
  determining a first latency, the first latency being indicative of an amount of time, after receipt of the first data request, that is required to derive the first value using the multidimensional dataset;
  
  defining a pre-aggregated dataset based on the first data request, the pre-aggregated dataset being a proper subset of the multidimensional dataset used to derive the first value;
  
  calculating a benefit score for the pre-aggregated dataset based on a difference between the first latency and a second latency that is indicative of an amount of time required to respond to the first data request using the pre-aggregated dataset;
  
  determining that the benefit score meets a predetermined threshold value; and
  
  storing the pre-aggregated dataset from the multidimensional dataset upon determining that the benefit score of the pre-aggregated dataset meets the predetermined threshold value.

16. A system, comprising:
- one or more processors; and
  
  memory coupled to the one or more processors and having instructions stored thereon, the instructions, when executed by the one or more processors, cause the one or more processors to perform operations comprising;
  
  receiving a first data request from a user device, the first data request specifying a first type of aggregated data being requested and one or more first criteria for selecting data from a multidimensional dataset to derive a first value for the first type of aggregated data;
  
  responding to the first data request, wherein the responding comprises;
  
  deriving the first value for the first type of aggregated data using the data selected from the multidimensional dataset according to the one or more first criteria; and
  
  determining a first latency for responding to the first data request using the multidimensional dataset, the first latency being indicative of an amount of time, after receipt of the first data request, that is required to derive the first value using the multidimensional dataset;
  
  defining a pre-aggregated dataset based on the first data request, the pre-aggregated dataset being a proper subset of the multidimensional dataset used to derive the first value;
  
  calculating a benefit score for the pre-aggregated dataset based on a difference between the first latency and a second latency that is indicative of an amount of time required to respond to the first data request using the pre-aggregated dataset;
  
  determining that the benefit score meets a predetermined threshold value; and
  
  storing the pre-aggregated dataset from the multidimensional dataset upon determining that the benefit score of the pre-aggregated dataset meets the predetermined threshold value.

17. A non-transitory computer-readable medium having instructions stored thereon, the instructions, when executed by one or more processors, cause the one or more processors to perform operations comprising:
- monitoring data requests received from user devices, each data request specifying a respective type of aggregated data being requested and one or more respective criteria for selecting data from a multidimensional dataset to derive a respective value for the type of aggregated data requested;
  
  based on the received data requests, defining, by a processor, one or more optimized datasets, the optimized datasets being proper subsets of the multidimensional data set each adapted for deriving the respective value for the type of aggregated data requested by at least one of the received data requests;
  
  maintaining a respective request count for each of the optimized datasets, the respective count tallying the received data requests for which the optimized dataset is adapted to provide improved performance in value derivation as compared to the multidimensional dataset;
  
  upon receipt of each of the data requests;
  
  determining whether a respective optimized dataset has been generated for the type of aggregated data requested by the data request;
  
  if the respective optimized dataset has not been generated;
  
  deriving the respective value for the type of aggregated data requested by the data request based on the data selected directly from the multidimensional dataset;
  
  determining, by a processor, a first latency for responding to the data request using the multidimensional dataset, the latency being indicative of an amount of time, after receipt of the first data request, that is required to derive the first value using the multidimensional dataset; and
  
  calculating, by a processor, a benefit score for the respective optimized dataset based on a difference between the first latency and a second, latency that is indicative of an amount of time required to respond to the first data request using the respective optimized dataset and the respective request count maintained for the respective optimized dataset.

18. A system, comprising:
- one or more processors; and
  
  memory coupled to the one or more processors and having instructions stored thereon, the instructions, when executed by the one or more processors, cause the one or more processors to perform operations comprising;
  
  monitoring data requests received from user devices, each data request specifying a respective type of aggregated data being requested and one or more respective criteria for selecting data from a multidimensional dataset to derive a respective value for the requested type of aggregated data;
  
  based on the received data requests, defining one or more optimized datasets, the optimized datasets being proper subsets of the multidimensional data set each adapted for deriving the respective value for the type of aggregated data requested by at least one of the received data requests;
  
  maintaining a respective request count for each of the optimized datasets, the respective count tallying the received data requests for which the optimized dataset is adapted to provide improved performance in value derivation as compared to the multidimensional dataset;
  
  upon receipt of each of the data requests;
  
  determining that a respective optimized dataset has not been stored for the type of aggregated data being requested by the data request;
  
  in response to determining that the respective optimized dataset has not been stored;
  
  deriving the respective value for the type of aggregated data requested by the data request, the value being derived based on data selected directly from the multidimensional dataset;
  
  determining a first latency, the first latency being indicative of an amount of time, after receipt of the data request, that is required to derive the respective value using the multidimensional dataset; and
  
  defining an optimized dataset based on the data request, the optimized dataset being a proper subset of the multidimensional dataset used to derive the respective value; and
  
  calculating a benefit score for the optimized dataset based on a difference between the first latency and a second latency that is indicative of an amount of time required to derive the respective value using the optimized dataset, and the respective request count maintained for the optimized dataset.

Specification

Resources

Litigation Campaign Assessment

Current Assignee
Google LLC (Alphabet Inc.)
Original Assignee
Google Inc. (Alphabet Inc.)
Inventors
Cai, Chao, Ewald, Eric W., Tangney, Cameron M., Nandy, Sagnik
Primary Examiner(s)
VU, BAI DUC

Application Number

US12/860,328
Time in Patent Office

1,103 Days
Field of Search

None
US Class Current

707/776
CPC Class Codes

G06F 16/24   Querying

G06F 16/24539   using cached or materialise...

G06F 16/24556   Aggregation; Duplicate elim...

G06F 16/24578   using ranking

G06Q 30/0242   Determining effectiveness o...

Dynamically generating pre-aggregated datasets

First Claim

2 Assignments

0 Petitions

Accused Products

Abstract

51 Citations

18 Claims

Specification

Solutions

Use Cases

Quick Links

Dynamically generating pre-aggregated datasets

First Claim

2 Assignments

Subscription Required

Subscription Required

0 Petitions

Subscription Required

Accused Products

Subscription Required

Abstract

51 Citations

18 Claims

Specification

Subscription Required

Solutions

Use Cases

Quick Links