Profile-driven data stream processing
First Claim
1. A method for compiling a data stream processing application, wherein the method comprises:
- receiving, by a compiler executing on a computer system, source code for a data stream processing application, wherein the source code comprises source code for a plurality of operators, each of which performs a data processing function;
determining, by the compiler, one or more characteristics of the plurality of operators within the data stream processing application, wherein said determining comprises;
injecting profiling code into an instrumented version of the data stream processing application;
running the data stream processing application under a sample workload;
using the profiling code to collect one or more computation and communication characteristics of the plurality of operators within the data stream processing application; and
processing the one or more collected computation and communication characteristics to compute (i), for each of the plurality of operators, an average amount of demanded processing resources and, (ii) for each port of the plurality of operators, a mean data rate;
grouping, by the compiler, the plurality of operators into one or more execution containers based on the average amount of demanded processing resources and the mean data rate computations; and
compiling, by the compiler, the source code for the data stream processing application into executable code, wherein the executable code comprises a plurality of execution units, wherein each execution unit contains one or more of the plurality of operators, wherein each operator is assigned to an execution unit based on the grouping, and wherein each execution unit is to be executed in a partition.
1 Assignment
0 Petitions
Accused Products
Abstract
Techniques for compiling a data stream processing application are provided. The techniques include receiving, by a compiler executing on a computer system, source code for a data stream processing application, wherein the source code comprises source code for a plurality of operators, each of which performs a data processing function, determining, by the compiler, one or more characteristics of operators within the data stream processing application, grouping, by the compiler, the operators into one or more execution containers based on the one or more characteristics, and compiling, by the compiler, the source code for the data stream processing application into executable code, wherein the executable code comprises a plurality of execution units, wherein each execution unit contains one or more of the operators, wherein each operator is assigned to an execution unit based on the grouping, and wherein each execution unit is to be executed in a partition.
-
Citations
19 Claims
-
1. A method for compiling a data stream processing application, wherein the method comprises:
-
receiving, by a compiler executing on a computer system, source code for a data stream processing application, wherein the source code comprises source code for a plurality of operators, each of which performs a data processing function; determining, by the compiler, one or more characteristics of the plurality of operators within the data stream processing application, wherein said determining comprises; injecting profiling code into an instrumented version of the data stream processing application; running the data stream processing application under a sample workload; using the profiling code to collect one or more computation and communication characteristics of the plurality of operators within the data stream processing application; and processing the one or more collected computation and communication characteristics to compute (i), for each of the plurality of operators, an average amount of demanded processing resources and, (ii) for each port of the plurality of operators, a mean data rate; grouping, by the compiler, the plurality of operators into one or more execution containers based on the average amount of demanded processing resources and the mean data rate computations; and compiling, by the compiler, the source code for the data stream processing application into executable code, wherein the executable code comprises a plurality of execution units, wherein each execution unit contains one or more of the plurality of operators, wherein each operator is assigned to an execution unit based on the grouping, and wherein each execution unit is to be executed in a partition. - View Dependent Claims (2, 3, 4, 5, 6, 7)
-
-
8. A computer program product comprising a non-transitory tangible computer readable recordable storage device including computer useable program code for compiling a data stream processing application, wherein the computer usable program code comprises one or more distinct software modules, the computer program product including:
-
computer useable program code for receiving, by a compiler executing on a computer system, source code for a data stream processing application, wherein the source code comprises source code for a plurality of operators, each of which performs a data processing function; computer useable program code for determining, by the compiler, one or more characteristics of the plurality of operators within the data stream processing application, wherein said determining comprises; injecting profiling code into an instrumented version of the data stream processing application; running the data stream processing application under a sample workload; using the profiling code to collect one or more computation and communication characteristics of the plurality of operators within the data stream processing application; and processing the one or more collected computation and communication characteristics to compute (i), for each of the plurality of operators, an average amount of demanded processing resources and, (ii) for each port of the plurality of operators, a mean data rate; computer useable program code for grouping, by the compiler, the plurality of operators into one or more execution containers based on the average amount of demanded processing resources and the mean data rate computations; and computer useable program code for compiling, by the compiler, the source code for the data stream processing application into executable code, wherein the executable code comprises a plurality of execution units, wherein each execution unit contains one or more of the plurality of operators, wherein each operator is assigned to an execution unit based on the grouping, and wherein each execution unit is to be executed in a partition. - View Dependent Claims (9, 10, 11, 12, 13)
-
-
14. A system for compiling a data stream processing application, comprising:
-
a memory; and at least one processor coupled to the memory and operative to; receive, by a compiler executing on a computer system, source code for a data stream processing application, wherein the source code comprises source code for a plurality of operators, each of which performs a data processing function; determine, by the compiler, one or more characteristics of the plurality of operators within the data stream processing application, wherein said determining comprises; injecting profiling code into an instrumented version of the data stream processing application; running the data stream processing application under a sample workload; using the profiling code to collect one or more computation and communication characteristics of the plurality of operators within the data stream processing application; and processing the one or more collected computation and communication characteristics to compute (i), for each of the plurality of operators, an average amount of demanded processing resources and, (ii) for each port of the plurality of operators, a mean data rate; group, by the compiler, the plurality of operators into one or more execution containers based on the average amount of demanded processing resources and the mean data rate computations; and compile, by the compiler, the source code for the data stream processing application into executable code, wherein the executable code comprises a plurality of execution units, wherein each execution unit contains one or more of the plurality of operators, wherein each operator is assigned to an execution unit based on the grouping, and wherein each execution unit is to be executed in a partition. - View Dependent Claims (15, 16, 17, 18, 19)
-
Specification