PARALLEL PROCESSING OF CONTINUOUS QUERIES ON DATA STREAMS
First Claim
1. A parallel stream processing engine of continuous queries formed by a plurality of instances wherein each instance is executed in any processing node that of the processing engine, wherein the cooperation of instances processes a query, comprising:
- a) means for receiving the query to be deployed;
b) means for splitting the original query into subqueries, obtaining at least one subquery;
each subquery is executed at least in one node;
c) means for labeling with timestamps the tuples produced by each operator of each subquery;
d) wherein between each two consecutive subqueries,i) a load balancer is interposed at the output of each node, the load balancer executing each of the instances of the source subquery;
ii) the output from the load balancer is connected with all the nodes in which one of the instances of the destination subquery is executed;
iii) an input merger is interposed in each node, the input merger executing each of the instances of destination subquery;
iv) checking is performed to determine if all subqueries contain at most a stateful operator and if inputs are connected to previous subqueries, wherein;
a) if the checking succeeds, the load balancer sends received tuples with a same key to a same instance of the source subquery;
b) if the checking fails, a load balancer and an input merger are interposed before each stateful operator that is preceded by any other operator, such that the stateful operator in the node where each of the instances of the destination query are executed sends all received tuples with the same key to the same instance of the destination subquery.
1 Assignment
0 Petitions
Accused Products
Abstract
A continuous query parallel engine on data streams provides scalability and increases the throughput by the addition of new nodes. The parallel processing can be applied to data stream processing and complex events processing. The continuous query parallel engine receives the query to be deployed and splits the original query into subqueries, obtaining at least one subquery; each subquery is executed in at least in one node. Tuples produced by each operator of each subquery are labeled with timestamps. A load balancer is interposed at the output of each node that executes each one of the instances of the source subquery and an input merger is interposed in each node that executes each one of the instances of a destination subquery. After checks are performed, further load balancers or input managers may be added.
206 Citations
16 Claims
-
1. A parallel stream processing engine of continuous queries formed by a plurality of instances wherein each instance is executed in any processing node that of the processing engine, wherein the cooperation of instances processes a query, comprising:
-
a) means for receiving the query to be deployed; b) means for splitting the original query into subqueries, obtaining at least one subquery;
each subquery is executed at least in one node;c) means for labeling with timestamps the tuples produced by each operator of each subquery; d) wherein between each two consecutive subqueries, i) a load balancer is interposed at the output of each node, the load balancer executing each of the instances of the source subquery; ii) the output from the load balancer is connected with all the nodes in which one of the instances of the destination subquery is executed; iii) an input merger is interposed in each node, the input merger executing each of the instances of destination subquery; iv) checking is performed to determine if all subqueries contain at most a stateful operator and if inputs are connected to previous subqueries, wherein; a) if the checking succeeds, the load balancer sends received tuples with a same key to a same instance of the source subquery; b) if the checking fails, a load balancer and an input merger are interposed before each stateful operator that is preceded by any other operator, such that the stateful operator in the node where each of the instances of the destination query are executed sends all received tuples with the same key to the same instance of the destination subquery. - View Dependent Claims (2, 3, 4, 5, 6, 7, 8)
-
-
9. A method of parallel stream processing continuous queries formed by a plurality of instances wherein each instance is executed in a processing node comprising a processing engine, wherein the cooperation of instances processes a query, comprising the following steps:
-
a) receiving the query to be deployed; b) splitting the original query into subqueries, obtaining at least one subquery;
each subquery is executed at least in one node;c) labeling with timestamps the tuples produced by each operator of each subquery; d) between each two consecutive subqueries, i) interposing a load balancer at the output of each node, the load balancer executing each of the instances of the source subquery; ii) connetcting the output from the load balancer with all the nodes in which one of the instances of the destination subquery is executed; iii) interposing an input merger in each node, the input merger executing each of the instances of destination subquery; iv) checking if all subqueries contain at most a stateful operator and if inputs are connected to previous subqueries, wherein; a) if the checking succeeds, the load balancer sends received tuples with a same key to a same instance of the source subquery; b) if the checking fails, a load balancer and an input merger are interposed before each stateful operator that is preceded by any other operator, such that the stateful operator in the node where each of the instances of the destination query are executed sends all received tuples with the same key to the same instance of the destination subquery. - View Dependent Claims (10, 11, 12, 13, 14, 15, 16)
-
Specification