Method and apparatus for efficient aggregate computation over data streams
First Claim
1. A method, comprising:
- obtaining a data stream;
obtaining a set of aggregate queries to be executed on the data stream; and
generating a query plan for executing the set of aggregate queries on the data stream, wherein the generated query plan comprises generating at least one intermediate aggregate query, wherein the intermediate aggregate query combines a subset of aggregate queries from the set of aggregate queries so as to pre-aggregate data from the data stream prior to execution of the subset of aggregate queries such that the generated query plan is optimized for computational expense based on a given cost model;
wherein the generated query plan comprises a tree structure, the query plan generating step further comprises determining an optimal query plan with a lowest computation cost by determining a minimum-cost aggregate tree, and the minimum-cost aggregate tree is determined using a heuristic which adds one or more random aggregate queries to the aggregate tree to form an expanded aggregate graph, and uses a directed Steiner tree heuristic to find the minimum-cost aggregate subtree of the expanded aggregate graph;
wherein the generation of the query plan is implemented by executing one or more software programs on a processor device.
5 Assignments
0 Petitions
Accused Products
Abstract
Improved techniques are disclosed for processing data stream queries wherein a data stream is obtained, a set of aggregate queries to be executed on the data stream is obtained, and a query plan for executing the set of aggregate queries on the data stream is generated. In a first method, the generated query plan includes generating at least one intermediate aggregate query, wherein the intermediate aggregate query combines a subset of aggregate queries from the set of aggregate queries so as to pre-aggregate data from the data stream prior to execution of the subset of aggregate queries such that the generated query plan is optimized for computational expense based on a given cost model. In a second method, the generated query plan includes identifying similar filters in two or more aggregate queries of the set of aggregate queries and combining the similar filters into a single filter such that the single filter is usable to pre-filter data input to the two or more aggregate queries.
29 Citations
3 Claims
-
1. A method, comprising:
-
obtaining a data stream; obtaining a set of aggregate queries to be executed on the data stream; and generating a query plan for executing the set of aggregate queries on the data stream, wherein the generated query plan comprises generating at least one intermediate aggregate query, wherein the intermediate aggregate query combines a subset of aggregate queries from the set of aggregate queries so as to pre-aggregate data from the data stream prior to execution of the subset of aggregate queries such that the generated query plan is optimized for computational expense based on a given cost model; wherein the generated query plan comprises a tree structure, the query plan generating step further comprises determining an optimal query plan with a lowest computation cost by determining a minimum-cost aggregate tree, and the minimum-cost aggregate tree is determined using a heuristic which adds one or more random aggregate queries to the aggregate tree to form an expanded aggregate graph, and uses a directed Steiner tree heuristic to find the minimum-cost aggregate subtree of the expanded aggregate graph; wherein the generation of the query plan is implemented by executing one or more software programs on a processor device. - View Dependent Claims (2)
-
-
3. Apparatus, comprising:
-
a memory; and a processor coupled to the memory and operative to;
obtain a data stream;
obtain a set of aggregate queries to be executed on the data stream; and
generate a query plan for executing the set of aggregate queries on the data stream, wherein the generated query plan comprises at least one of;
(i) generating at least one intermediate aggregate query, wherein the intermediate aggregate query combines a subset of aggregate queries from the set of aggregate queries so as to pre-aggregate data from the data stream prior to execution of the subset of aggregate queries such that the generated query plan is optimized for computational expense based on a given cost model; and
(ii) identifying similar filters in two or more aggregate queries of the set of aggregate queries and combining the similar filters into a single filter such that the single filter is usable to pre-filter data input to the two or more aggregate queries;wherein the generated query plan comprises a tree structure, the query plan generating operation further comprises determining an optimal query plan with a lowest computation cost by determining a minimum-cost aggregate tree, and the minimum-cost aggregate tree is determined using a heuristic which adds one or more random aggregate queries to the aggregate tree to form an expanded aggregate graph, and uses a directed Steiner tree heuristic to find the minimum-cost aggregate subtree of the expanded aggregate graph.
-
Specification