Dynamically generating pre-aggregated datasets
First Claim
1. A computer-implemented method performed by one or more data processing apparatus, comprising:
- receiving a first data request from a user device, the first data request specifying a first type of aggregated data being requested and one or more first criteria for selecting data from a multidimensional dataset to derive a first value for the first type of aggregated data;
responding to the first data request, wherein the responding comprises;
deriving the first value for the first type of aggregated data using the data selected from the multidimensional dataset according to the one or more first criteria; and
determining a first latency for responding to the first data request using the multidimensional dataset, the first latency being indicative of an amount of time, after receipt of the first data request, that is required to derive the first value using the multidimensional dataset;
defining a pre-aggregated dataset based on the first data request, the pre-aggregated dataset, the pre-aggregated dataset being a proper subset of the multidimensional dataset used to derive the first value;
calculating a benefit score for the pre-aggregated dataset based on a difference between the first latency and a second latency that is indicative of an amount of time required to respond to the first data request using the pre-aggregated dataset;
determining that the benefit score meets a predetermined threshold value; and
storing the pre-aggregated dataset from the multidimensional dataset upon determining that the benefit score of the pre-aggregated dataset meets the predetermined threshold value.
2 Assignments
0 Petitions
Accused Products
Abstract
Methods, systems, and apparatus, including computer programs encoded on a computer storage medium, for dynamically generating and configuring pre-aggregated datasets optimized for responding to particular types of data requests made against a large sub-optimal multidimensional dataset are disclosed. A dynamic aggregator monitors the query types and response latencies associated with queries made against the large multidimensional dataset. The dynamic aggregator defines pre-aggregated datasets based on the types of queries received from users and calculates a respective benefit score for each pre-aggregated dataset. The benefit score of each pre-aggregated dataset can be based on the recorded latencies and query count for the pre-aggregated dataset. The dynamic aggregator can decide whether to generate and/or maintain particular pre-aggregated datasets based on the current values of the benefit scores associated with the particular pre-aggregated datasets.
51 Citations
18 Claims
-
1. A computer-implemented method performed by one or more data processing apparatus, comprising:
-
receiving a first data request from a user device, the first data request specifying a first type of aggregated data being requested and one or more first criteria for selecting data from a multidimensional dataset to derive a first value for the first type of aggregated data; responding to the first data request, wherein the responding comprises; deriving the first value for the first type of aggregated data using the data selected from the multidimensional dataset according to the one or more first criteria; and determining a first latency for responding to the first data request using the multidimensional dataset, the first latency being indicative of an amount of time, after receipt of the first data request, that is required to derive the first value using the multidimensional dataset; defining a pre-aggregated dataset based on the first data request, the pre-aggregated dataset, the pre-aggregated dataset being a proper subset of the multidimensional dataset used to derive the first value; calculating a benefit score for the pre-aggregated dataset based on a difference between the first latency and a second latency that is indicative of an amount of time required to respond to the first data request using the pre-aggregated dataset; determining that the benefit score meets a predetermined threshold value; and storing the pre-aggregated dataset from the multidimensional dataset upon determining that the benefit score of the pre-aggregated dataset meets the predetermined threshold value. - View Dependent Claims (2, 3, 4, 5, 6)
-
-
7. A computer-implemented method, comprising:
-
monitoring data requests received from user devices, each data request specifying a respective type of aggregated data being requested and one or more respective criteria for selecting data from a multidimensional dataset to derive a respective value for the requested type of aggregated data; based on the received data requests, defining one or more optimized datasets, the optimized datasets being proper subsets of the multidimensional dataset wherein each optimized dataset is defined based on data required to derive the respective value for the type of aggregated data requested by at least one of the received data requests; upon receipt a data request; determining that a respective optimized dataset has not been stored for the type of aggregated data being requested by the data request; in response to determining that the respective optimized dataset has not been stored; deriving the respective value for the type of aggregated data requested by the data request, the value being derived based on data selected directly from the multidimensional dataset; determining a first latency, the first latency being indicative of an amount of time, after receipt of the data request, that is required to derive the respective value using the multidimensional dataset; and defining an optimized dataset based on the data request, the optimized dataset being a proper subset of the multidimensional dataset used to derive the respective value; and calculating a benefit score for the optimized dataset based on a difference between the first latency and a second latency that is indicative of an amount of time required to derive the respective value using the optimized dataset, and the respective request count maintained for the respective optimized dataset. - View Dependent Claims (8)
-
-
9. A computer-implemented method, comprising:
-
monitoring respective query types and first response times associated with queries made against a multidimensional dataset, each query being received from a user of the multidimensional dataset, a response time being a time interval between receiving a query and providing a response to the query; for each of one or more query types; determining, for one or more query types, that pre-aggregated response time resulting from using a pre-aggregated dataset to respond to the query type are lower than the first response time resulting from using the multidimensional dataset to respond to the query type, the pre-aggregated dataset being a proper subset of the multidimensional dataset; in response to determining that using the pre-aggregated response time is lower than the first response time, storing the pre-aggregated dataset for the query type, the pre-aggregated dataset including pre-aggregated values derived from the multidimensional dataset; and dynamically updating a respective benefit score for each pre-aggregated dataset based on a current count and recorded pre-aggregated response time associated with the queries that are of the associated query type of the pre-aggregated dataset. - View Dependent Claims (10, 11, 12, 13, 14)
-
-
15. A computer storage device having instructions stored thereon, the instructions, when executed by one or more processors, cause the processors to perform operations comprising:
-
receiving a first data request from a user device, the first data request specifying a first type of aggregated data being requested and one or more first criteria for selecting data from a multidimensional dataset to derive a first value for the first type of aggregated data; responding to the first data request, wherein the responding comprises; deriving the first value for the first type of aggregated data using the data selected from the multidimensional dataset according to the one or more first criteria; and determining a first latency, the first latency being indicative of an amount of time, after receipt of the first data request, that is required to derive the first value using the multidimensional dataset; defining a pre-aggregated dataset based on the first data request, the pre-aggregated dataset being a proper subset of the multidimensional dataset used to derive the first value; calculating a benefit score for the pre-aggregated dataset based on a difference between the first latency and a second latency that is indicative of an amount of time required to respond to the first data request using the pre-aggregated dataset; determining that the benefit score meets a predetermined threshold value; and storing the pre-aggregated dataset from the multidimensional dataset upon determining that the benefit score of the pre-aggregated dataset meets the predetermined threshold value.
-
-
16. A system, comprising:
-
one or more processors; and memory coupled to the one or more processors and having instructions stored thereon, the instructions, when executed by the one or more processors, cause the one or more processors to perform operations comprising; receiving a first data request from a user device, the first data request specifying a first type of aggregated data being requested and one or more first criteria for selecting data from a multidimensional dataset to derive a first value for the first type of aggregated data; responding to the first data request, wherein the responding comprises; deriving the first value for the first type of aggregated data using the data selected from the multidimensional dataset according to the one or more first criteria; and determining a first latency for responding to the first data request using the multidimensional dataset, the first latency being indicative of an amount of time, after receipt of the first data request, that is required to derive the first value using the multidimensional dataset; defining a pre-aggregated dataset based on the first data request, the pre-aggregated dataset being a proper subset of the multidimensional dataset used to derive the first value; calculating a benefit score for the pre-aggregated dataset based on a difference between the first latency and a second latency that is indicative of an amount of time required to respond to the first data request using the pre-aggregated dataset; determining that the benefit score meets a predetermined threshold value; and storing the pre-aggregated dataset from the multidimensional dataset upon determining that the benefit score of the pre-aggregated dataset meets the predetermined threshold value.
-
-
17. A non-transitory computer-readable medium having instructions stored thereon, the instructions, when executed by one or more processors, cause the one or more processors to perform operations comprising:
-
monitoring data requests received from user devices, each data request specifying a respective type of aggregated data being requested and one or more respective criteria for selecting data from a multidimensional dataset to derive a respective value for the type of aggregated data requested; based on the received data requests, defining, by a processor, one or more optimized datasets, the optimized datasets being proper subsets of the multidimensional data set each adapted for deriving the respective value for the type of aggregated data requested by at least one of the received data requests; maintaining a respective request count for each of the optimized datasets, the respective count tallying the received data requests for which the optimized dataset is adapted to provide improved performance in value derivation as compared to the multidimensional dataset; upon receipt of each of the data requests; determining whether a respective optimized dataset has been generated for the type of aggregated data requested by the data request; if the respective optimized dataset has not been generated; deriving the respective value for the type of aggregated data requested by the data request based on the data selected directly from the multidimensional dataset; determining, by a processor, a first latency for responding to the data request using the multidimensional dataset, the latency being indicative of an amount of time, after receipt of the first data request, that is required to derive the first value using the multidimensional dataset; and calculating, by a processor, a benefit score for the respective optimized dataset based on a difference between the first latency and a second, latency that is indicative of an amount of time required to respond to the first data request using the respective optimized dataset and the respective request count maintained for the respective optimized dataset.
-
-
18. A system, comprising:
-
one or more processors; and memory coupled to the one or more processors and having instructions stored thereon, the instructions, when executed by the one or more processors, cause the one or more processors to perform operations comprising; monitoring data requests received from user devices, each data request specifying a respective type of aggregated data being requested and one or more respective criteria for selecting data from a multidimensional dataset to derive a respective value for the requested type of aggregated data; based on the received data requests, defining one or more optimized datasets, the optimized datasets being proper subsets of the multidimensional data set each adapted for deriving the respective value for the type of aggregated data requested by at least one of the received data requests; maintaining a respective request count for each of the optimized datasets, the respective count tallying the received data requests for which the optimized dataset is adapted to provide improved performance in value derivation as compared to the multidimensional dataset; upon receipt of each of the data requests; determining that a respective optimized dataset has not been stored for the type of aggregated data being requested by the data request; in response to determining that the respective optimized dataset has not been stored; deriving the respective value for the type of aggregated data requested by the data request, the value being derived based on data selected directly from the multidimensional dataset; determining a first latency, the first latency being indicative of an amount of time, after receipt of the data request, that is required to derive the respective value using the multidimensional dataset; and defining an optimized dataset based on the data request, the optimized dataset being a proper subset of the multidimensional dataset used to derive the respective value; and calculating a benefit score for the optimized dataset based on a difference between the first latency and a second latency that is indicative of an amount of time required to derive the respective value using the optimized dataset, and the respective request count maintained for the optimized dataset.
-
Specification