Support for user defined functions in a data stream management system
First Claim
1. A method implemented in at least one computer, the method comprising:
- receiving a plurality of incoming streams of data;
wherein the data in each incoming stream arrives indefinitely;
processing the data received in the plurality of incoming streams, to execute thereon a plurality of continuous queries based on an existing global plan;
during said processing, receiving from a user, a command to create a function and identification of a set of instructions to be executed to perform the function;
in response to receipt of said command, creating in a memory of said at least one computer, an instance of the set of instructions identified by the identification received from the user, and a first structure comprising a pointer to the instance of the set of instructions;
during said processing, receiving a new continuous query to be executed using the function;
during said processing, based on the first structure, creating in the memory an operator to invoke the instance of the set of instructions, the operator comprising a second structure, the second structure comprising a first field to hold the pointer to the instance of the set of instructions, and at least one additional field corresponding to at least one argument of the function;
during said processing, at least one processor in said at least one computer modifying the existing global plan by adding thereto said operator, thereby to obtain a modified global plan;
altering said processing, to cause execution of the new continuous query in addition to the plurality of continuous queries, based on the modified global plan, thereby to perform the function; and
based at least partially on processing of at least a portion of the data by executing the new continuous query, outputting from said at least one computer an output stream of data;
wherein during the execution of the new continuous query, the instance of the set of instructions is invoked repeatedly on receipt of each tuple of the data without re-creating the instance of the set of instructions.
0 Assignments
0 Petitions
Accused Products
Abstract
A data stream management system (DSMS) is designed to support a new user-defined function, by creating and using at least two structures as follows. A first structure (“metadata entry”) is created in response to a command for creation of the new function, and maps a single instance of a class to the function'"'"'s name. A second structure is created with creation of an operator on receipt of each new continuous query that uses the new function. The second structure (“operator specific data structure”) contains a path to the newly-created instance, which path is obtained by looking up the first structure. Additional second structures are created on receipt of additional continuous queries which use the new function, but all second structures contain the same path. All continuous queries use the same instance. Repeated use of a single instance to compile and execute multiple queries eliminates repeated instantiation of the same function.
-
Citations
25 Claims
-
1. A method implemented in at least one computer, the method comprising:
-
receiving a plurality of incoming streams of data; wherein the data in each incoming stream arrives indefinitely; processing the data received in the plurality of incoming streams, to execute thereon a plurality of continuous queries based on an existing global plan; during said processing, receiving from a user, a command to create a function and identification of a set of instructions to be executed to perform the function; in response to receipt of said command, creating in a memory of said at least one computer, an instance of the set of instructions identified by the identification received from the user, and a first structure comprising a pointer to the instance of the set of instructions; during said processing, receiving a new continuous query to be executed using the function; during said processing, based on the first structure, creating in the memory an operator to invoke the instance of the set of instructions, the operator comprising a second structure, the second structure comprising a first field to hold the pointer to the instance of the set of instructions, and at least one additional field corresponding to at least one argument of the function; during said processing, at least one processor in said at least one computer modifying the existing global plan by adding thereto said operator, thereby to obtain a modified global plan; altering said processing, to cause execution of the new continuous query in addition to the plurality of continuous queries, based on the modified global plan, thereby to perform the function; and based at least partially on processing of at least a portion of the data by executing the new continuous query, outputting from said at least one computer an output stream of data; wherein during the execution of the new continuous query, the instance of the set of instructions is invoked repeatedly on receipt of each tuple of the data without re-creating the instance of the set of instructions. - View Dependent Claims (2, 3, 4, 5, 6, 7, 8, 9, 21, 22)
-
-
10. One or more computer readable non-transitory storage media encoded with a plurality of instructions to be executed in at least one computer, said instructions comprising:
-
instructions to process a plurality of incoming streams of data, to execute thereon a plurality of continuous queries based on a global plan; wherein the data in each incoming stream arrives indefinitely; instructions to receive a command to create a function and identification of a set of additional instructions to be executed to perform said function; instructions to create in a memory of said at least one computer, an instance of said set of additional instructions and to further create a first structure comprising a reference to said instance of said set of additional instructions; instructions to receive a new continuous query to be executed using said function; instructions based on said first structure, to create in said memory an operator to invoke the instance, the operator comprising a second structure, the second structure comprising a first field to hold said reference to said instance of the set of instructions, and at least one additional field corresponding to at least one argument of said function; instructions to modify the global plan by adding thereto said operator, thereby to obtain a modified plan; instructions to alter said instructions to process, to cause execution of the new continuous query in addition to said plurality of continuous queries, based on the modified plan, thereby to perform said function; and instructions to output from said at least one computer, a stream generated based at least partially on processing of said data by executing the new continuous query; wherein during the execution of the new continuous query, the instance of the set of instructions is to be invoked repeatedly on receipt of each tuple of the data without re-creating the instance of the set of instructions. - View Dependent Claims (11, 12, 13, 14, 15, 16, 17)
-
-
18. A method implemented in at least one computer, the method comprising:
-
processing a plurality of incoming streams of data comprising time-stamped tuples, to execute thereon a plurality of existing queries based on an existing global plan in a memory of said at least one computer; wherein the data in each incoming stream arrives indefinitely; during said processing, receiving a command to create a function and identification of a set of instructions to be executed to perform said function; during said processing, creating in said memory, an instance of said set of instructions and a metadata structure comprising a pointer to said instance; during said processing, receiving a new query to be executed using said function; during said processing, based on said metadata structure in said memory, creating in said memory a new operator to invoke the instance, the new operator comprising an operator-specific structure, the operator-specific structure comprising a first field to hold said pointer to said instance of said set of instructions, and at least one additional field corresponding to at least one argument of said function; during said processing, at least one processor in said at least one computer modifying the existing global plan in said memory by adding thereto said new operator, thereby to obtain a modified global plan in said memory; altering said processing to use the modified global plan, thereby to cause execution of the new query in addition to execution of said plurality of existing queries; and outputting from said at least one computer, an output stream of data generated based at least partially on processing of time-stamped tuples in said incoming streams by execution of the new query; wherein during the execution of the new query, the instance of the set of instructions is invoked repeatedly on receipt of each tuple of the data without re-creating the instance of the set of instructions. - View Dependent Claims (19, 20)
-
-
23. A data stream management system that processes a plurality of incoming streams of data using a plurality of continuous queries, the data stream management system comprising:
-
at least one processor; at least one memory coupled to the at least one processor, the at least one memory comprising an existing global plan; means for receiving a plurality of incoming streams of data; wherein the data in each incoming stream arrives indefinitely; means for processing the data received in the plurality of incoming streams, to execute thereon a plurality of continuous queries based on the existing global plan; means for receiving from a user, during operation of said means for processing, a command to create a function and identification of a set of instructions to be executed to perform the function, means for creating in said at least one memory, an instance of the set of instructions identified by the identification received from the user, and a first structure comprising a pointer to the instance of the set of instructions; means for receiving, during operation of said means for processing, a new continuous query to be executed using the function; means for creating in the memory, during operation of said means for processing, and based on the first structure, an operator to invoke the instance of the set of instructions, the operator comprising a second structure, the second structure comprising a first field to hold the pointer to the instance of the set of instructions, and at least one additional field corresponding to at least one argument of the function; means for modifying the existing global plan, during operation of said means for processing, by adding said operator to the existing global plan, thereby to obtain a modified global plan; means for altering operation of said means for processing, to cause execution of the new continuous query in addition to the plurality of continuous queries, based on the modified global plan, thereby to perform the function; and means, based at least partially on processing of at least a portion of the data by executing the new continuous query, for outputting from the data stream management system an output stream of data; wherein during the execution of the new continuous query, the instance of the set of instructions is invoked repeatedly on receipt of each tuple of the data without re-creating the instance of the set of instructions. - View Dependent Claims (24, 25)
-
Specification