Implementation of stream algebra over class instances
First Claim
Patent Images
1. A method of processing high-volume continuous streams of data, said method comprising:
- receiving a query comprising query data and operator semantics, said query representing a request for data;
translating the received query into a plurality of stream algebra operators;
defining an execution plan representing a data flow implemented by the stream algebra operators;
receiving a first event from a high-volume continuous data stream while maintaining minimal state information associated with the data;
assigning an expiration time to the received first event as a function of the query data;
receiving a plurality of second events from the data stream;
executing one or more of the stream algebra operators according to the defined execution plan and as a function of the operator semantics from the received query to find a match between the received first event and at least one of the received plurality of second events prior to expiration of the first event based on the assigned expiration time associated with the first event wherein;
if one of the second events does not match the first event, then continuing to wait until another second event is received; and
if one of the second events matches the first event, then combining the first and second events to produce a third event; and
executing the third event according to the execution plan.
2 Assignments
0 Petitions
Accused Products
Abstract
Creating and executing a distributed stream processing operator graph based on a query. The operator graph includes movable stream algebra operators for processing events received from high volume data streams. The operators are partially compiled and distributed to computing devices for completion of the compilation and subsequent execution. During execution, the operators maintain minimal state information associated with received events via an expiration time assigned to each of the event instances. Additional events are generated and aggregated by the operators for communication to a service responsible for the query.
124 Citations
16 Claims
-
1. A method of processing high-volume continuous streams of data, said method comprising:
-
receiving a query comprising query data and operator semantics, said query representing a request for data; translating the received query into a plurality of stream algebra operators; defining an execution plan representing a data flow implemented by the stream algebra operators; receiving a first event from a high-volume continuous data stream while maintaining minimal state information associated with the data; assigning an expiration time to the received first event as a function of the query data; receiving a plurality of second events from the data stream; executing one or more of the stream algebra operators according to the defined execution plan and as a function of the operator semantics from the received query to find a match between the received first event and at least one of the received plurality of second events prior to expiration of the first event based on the assigned expiration time associated with the first event wherein; if one of the second events does not match the first event, then continuing to wait until another second event is received; and if one of the second events matches the first event, then combining the first and second events to produce a third event; and executing the third event according to the execution plan. - View Dependent Claims (2, 3, 6, 7, 8, 9)
-
-
4. A method of processing high-volume continuous streams of data while maintaining minimal state information associated with the data, said method comprising:
-
receiving a query comprising query data and operator semantics, said query representing a request for data; translating the received query into a plurality of stream algebra operators; defining an execution plan representing a data flow implemented by the stream algebra operators; receiving a first event from a data stream; assigning an expiration time to the received first event as a function of the query data; receiving a plurality of second events from the data stream; executing one or more of the stream algebra operators according to the defined execution plan and as a function of the operator semantics from the received query to find a match between the received first event and at least one of the received plurality of second events prior to expiration of the first event based on the assigned expiration time associated with the first event; producing, as a function of said executing, a third event to be executed according to the execution plan; identifying the destination computing device and the one of the stream algebra operators based on the execution plan; pausing execution of the stream algebra operator; storing state information maintained by the stream algebra operator; and sending the stream algebra operator and the stored state information to the destination computing device.
-
-
5. A method of processing high-volume continuous streams of data while maintaining minimal state information associated with the data, said method comprising:
-
receiving a query comprising query data and operator semantics, said query representing a request for data; translating the received query into a plurality of stream algebra operators; defining an execution plan representing a data flow implemented by the stream algebra operators; receiving a first event from a data stream; assigning an expiration time to the received first event as a function of the query data; receiving a plurality of second events from the data stream; executing one or more of the stream algebra operators according to the defined execution plan and as a function of the operator semantics from the received query to find a match between the received first event and at least one of the received plurality of second events prior to expiration of the first event based on the assigned expiration time associated with the first event; producing, as a function of said executing, a third event to be executed according to the execution plan; and adjusting the expiration time assigned to the first event to minimize the quantity of state information maintained by the stream algebra operator.
-
-
10. A system for processing high-volume continuous streams of data while maintaining minimal state information associated with the data, said system comprising:
-
a processor configured to execute computer-executable instructions for; receiving a query comprising query data and operator semantics, said query representing a request for data; and translating the received query into a plurality of stream algebra operators; a memory area for storing the plurality of stream algebra operators for building a distributed stream processing operator graph, each of said stream algebra operators being adapted to perform one or more functions on data events from an event stream; and said processor configured to execute computer-executable instructions for building a distributed operator graph using the stream algebra operators stored in the memory area by; defining an object to include an expression, said expression representing the one or more functions to be performed on the data events from the event stream; serializing the defined object including the expression into a message corresponding to one of the plurality of stream algebra operators stored in the memory area; identifying one of a plurality of destination computing devices to host said one of the plurality of stream algebra operators; transmitting the message to the identified destination computing device to build the distributed operator graph, wherein the destination computing device compiles the expression from the message into executable code representing said one of the plurality of stream algebra operators implementing a portion of the operator graph, wherein the executable code operates on the data events received from the event stream to produce aggregated events; receiving the aggregated events from the destination computing device; and processing the aggregated events in accordance with another portion of the operator graph implemented by the processor. - View Dependent Claims (11, 12, 13, 14, 15, 16)
-
Specification