Dynamically sharing a subtree of operators in a data stream management system operating on existing queries
First Claim
1. A computer-implemented method of processing a plurality of input streams of data, the method comprising:
- processing the plurality of input streams in a computer, by executing thereon a plurality of existing continuous queries based on a global plan;
during said processing, receiving a new continuous query to be executed;
during said processing, preparing an additional plan for use in execution of the new continuous query independent of the global plan;
wherein the additional plan is created and optimized independent of the global plan currently used by said processing, the additional plan including physical operators to execute the new continuous query;
during said processing, traversing the additional plan in a bottom up manner, to select therefrom a current node;
during said processing, checking if an operator at the current node is equivalent to any operator in a plurality of operators in the global plan currently used by said processing;
wherein two operators that perform identical functions with identical inputs are determined by said checking to be equivalent;
if said checking finds no equivalent, modifying the global plan currently used by said processing by adding thereto said operator at the current node to obtain a modified plan during said processing;
if said checking finds equivalence, and if said operator at the current node outputs a relation, propagating from said any operator, a current state of said relation to a new operator in the modified plan for use in executing said new continuous query prior to execution of the new continuous query;
wherein said propagating comprises using as said current state, a plurality of past tuples of said relation having a time stamp older than a current time;
returning to said traversing with the modified plan as the global plan, unless the current node is a root of the additional plan;
altering said processing, to cause execution of the new continuous query in addition to said plurality of existing continuous queries, based on the modified plan;
wherein said execution of the new continuous query on a stream in the plurality of input streams does not use any past tuples and instead uses new tuples that are time stamped after the current time; and
outputting from said computer, a stream generated based at least partially on processing of said data by executing the new continuous query.
1 Assignment
0 Petitions
Accused Products
Abstract
A new continuous query to a data stream management system (DSMS) may use several operators that are equivalent to operators currently being used by continuous queries that have been previously registered in the DSMS. To determine equivalence of operators, the DSMS checks at least the function and the data input to the operators. On finding equivalence, the DSMS modifies a global plan being executed, to use at least an existing subtree of operators during execution of the new continuous query, to generate a modified plan. The DSMS is also programmed to cause each relation source operator (which outputs a relation) to propagate a current state of the relation to each operator newly coupled to the relational operator. After propagation of current state to newly-coupled operators, each operator in the modified plan processes any new data and supplies the result to all operators coupled thereto, including newly-coupled operators and existing operators.
119 Citations
17 Claims
-
1. A computer-implemented method of processing a plurality of input streams of data, the method comprising:
-
processing the plurality of input streams in a computer, by executing thereon a plurality of existing continuous queries based on a global plan; during said processing, receiving a new continuous query to be executed; during said processing, preparing an additional plan for use in execution of the new continuous query independent of the global plan; wherein the additional plan is created and optimized independent of the global plan currently used by said processing, the additional plan including physical operators to execute the new continuous query; during said processing, traversing the additional plan in a bottom up manner, to select therefrom a current node; during said processing, checking if an operator at the current node is equivalent to any operator in a plurality of operators in the global plan currently used by said processing; wherein two operators that perform identical functions with identical inputs are determined by said checking to be equivalent; if said checking finds no equivalent, modifying the global plan currently used by said processing by adding thereto said operator at the current node to obtain a modified plan during said processing; if said checking finds equivalence, and if said operator at the current node outputs a relation, propagating from said any operator, a current state of said relation to a new operator in the modified plan for use in executing said new continuous query prior to execution of the new continuous query; wherein said propagating comprises using as said current state, a plurality of past tuples of said relation having a time stamp older than a current time; returning to said traversing with the modified plan as the global plan, unless the current node is a root of the additional plan; altering said processing, to cause execution of the new continuous query in addition to said plurality of existing continuous queries, based on the modified plan; wherein said execution of the new continuous query on a stream in the plurality of input streams does not use any past tuples and instead uses new tuples that are time stamped after the current time; and outputting from said computer, a stream generated based at least partially on processing of said data by executing the new continuous query. - View Dependent Claims (2, 3, 4, 13, 14)
-
-
5. A non-transitory computer readable storage medium encoded with a plurality of instructions for a computer to process a plurality of input streams of data, the instructions comprising:
-
instructions to process the plurality of input streams in said computer, by executing thereon a plurality of existing continuous queries based on a global plan; instructions to be executed during processing of the plurality of input streams, to receive a new continuous query to be executed; instructions to be executed during processing of the plurality of input streams, to prepare an additional plan for use in execution of the new continuous query independent of the global plan; wherein the additional plan is created and optimized independent of the global plan currently used by said instructions to process, the additional plan including physical operators to execute the new continuous query; instructions to be executed during processing of the plurality of input streams, to traverse the additional plan in a bottom up manner, to select therefrom a current node; instructions to be executed during processing of the plurality of input streams, to check if an operator at the current node is equivalent to any operator in a plurality of operators in the global plan currently used by said processing; instructions to be executed during processing of the plurality of input streams, wherein two operators that perform identical functions with identical inputs are determined to be equivalent by said instructions to check; instructions to be executed if execution of the instructions to check finds no equivalent, to modify the global plan currently used by said processing by adding thereto said operator at the current node to obtain a modified plan during said processing; instructions to be executed if execution of the instructions to check finds equivalence, and if said operator at the current node outputs a relation, to propagate from said any operator, a current state of said relation to a new operator in the modified plan for use in executing said new continuous query prior to execution of the new continuous query; wherein said instructions to propagate comprise instructions to use a plurality of past tuples of said relation having a time stamp older than a current time; instructions to return to said instructions to traverse with the modified plan as the global plan, unless the current node is a root of the additional plan; instructions to alter processing of the plurality of input streams, to cause execution of the new continuous query in addition to said plurality of existing continuous queries, based on the modified plan; wherein execution of the new continuous query on a stream in the plurality of input streams does not use any past tuples and instead uses new tuples that are time stamped after the current time; and instructions to output from said computer, a stream generated based at least partially on processing of said data by executing the new continuous query. - View Dependent Claims (6, 7, 8, 15, 16)
-
-
9. A computer comprising:
-
means for processing a plurality of input streams in said computer, by executing thereon a plurality of existing continuous queries based on a global plan; means, operable during processing of the plurality of input streams, for receiving a new continuous query to be executed; means, operable during processing of the plurality of input streams, for preparing an additional plan for use in execution of the new continuous query independent of the global plan; wherein the additional plan is created and optimized independent of the global plan currently used by said means for processing, the additional plan including physical operators to execute the new continuous query; means, operable during processing of the plurality of input streams, for traversing the additional plan in a bottom up manner, to select therefrom a current node; means, operable during processing of the plurality of input streams, for checking if an operator at the current node is equivalent to any operator in a plurality of operators in the global plan currently used by said processing; means, operable during processing of the plurality of input streams, wherein two operators that perform identical functions with identical inputs are determined to be equivalent by said means for checking; means, operable if the means for checking finds no equivalent, for modifying the global plan currently used by said processing by adding thereto said operator at the current node to obtain a modified plan during said processing; means, operable if the means for checking finds equivalence, and if said operator at the current node outputs a relation, for propagating from said any operator, a current state of said relation to a new operator in the modified plan for use in executing said new continuous query prior to execution of the new continuous query; wherein said means for propagating comprise means for using as said current state, a plurality of past tuples of said relation having a time stamp older than a current time; means for returning control to said means for traversing with the modified plan as the global plan, unless the current node is a root of the additional plan; means for altering processing of the plurality of input streams, to cause execution of the new continuous query in addition to said plurality of existing continuous queries, based on the modified plan; wherein execution of the new continuous query on a stream in the plurality of input streams does not use any past tuples and instead uses new tuples that are time stamped after the current time; and means for outputting from said computer, a stream generated based at least partially on processing of said data by executing the new continuous query. - View Dependent Claims (10, 11, 12, 17)
-
Specification