Operator graph changes in response to dynamic connections in stream computing applications
First Claim
1. A computer program product for optimizing a stream computing application comprising:
- a non-transitory computer-readable storage medium having computer-readable program code embodied therewith, the computer-readable program code comprising computer-readable program code configured to;
execute a first job and a second job, each comprising a plurality of respective operators that process streaming data by operation of one or more computer processors, wherein the plurality of respective operators in the first and second jobs are, respectively, interconnected such that data tuples flow between the plurality of respective operators to perform the first and second jobs;
establish an operator graph comprising the plurality of respective operators of both the first and second jobs, the operator graph defining at least one respective execution path through the plurality of respective operators for the first job and for the second job;
while the first and second jobs are executing, establish a connection between the first job and the second job by transmitting a data stream from a first operator of the first job to a second operator of the second job, wherein the first and second jobs are in the operator graph both before and after the connection is established;
before establishing the connection between the first job to the second job, set the data stream as exportable, wherein the plurality of respective operators associated with the second job do not receive data from, or send data to, the plurality of respective operators associated with the first job prior to setting the data stream as exportable;
monitor a performance indicator associated with the first operator of the first job, the performance indicator measuring an effect that the connection between the first and second jobs has on a performance of the first job; and
upon determining a value of the performance indicator satisfies a predefined threshold, optimize the stream computing application to improve the value of the performance indicator.
1 Assignment
0 Petitions
Accused Products
Abstract
A stream computing application may permit one job to connect to a data stream of a different job. As more and more jobs dynamically connect to the data stream, the connections may have a negative impact on the performance of the job that generates the data stream. Accordingly, a variety of metrics and statistics (e.g., CPU utilization or tuple rate) may be monitored to determine if the dynamic connections are harming performance. If so, the stream computing system may be optimized to mitigate the effects of the dynamic connections. For example, particular operators may be unfused from a processing element and moved to a compute node that has available computing resources. Additionally, the stream computing application may clone the data stream in order to distribute the workload of transmitting the data stream to the connected jobs.
93 Citations
13 Claims
-
1. A computer program product for optimizing a stream computing application comprising:
-
a non-transitory computer-readable storage medium having computer-readable program code embodied therewith, the computer-readable program code comprising computer-readable program code configured to; execute a first job and a second job, each comprising a plurality of respective operators that process streaming data by operation of one or more computer processors, wherein the plurality of respective operators in the first and second jobs are, respectively, interconnected such that data tuples flow between the plurality of respective operators to perform the first and second jobs; establish an operator graph comprising the plurality of respective operators of both the first and second jobs, the operator graph defining at least one respective execution path through the plurality of respective operators for the first job and for the second job; while the first and second jobs are executing, establish a connection between the first job and the second job by transmitting a data stream from a first operator of the first job to a second operator of the second job, wherein the first and second jobs are in the operator graph both before and after the connection is established; before establishing the connection between the first job to the second job, set the data stream as exportable, wherein the plurality of respective operators associated with the second job do not receive data from, or send data to, the plurality of respective operators associated with the first job prior to setting the data stream as exportable; monitor a performance indicator associated with the first operator of the first job, the performance indicator measuring an effect that the connection between the first and second jobs has on a performance of the first job; and upon determining a value of the performance indicator satisfies a predefined threshold, optimize the stream computing application to improve the value of the performance indicator. - View Dependent Claims (2, 3, 4, 5)
-
-
6. A system for optimizing a stream computing application, comprising:
-
a computer processor; and a memory containing a program that, when executed on the computer processor, performs an operation for processing data, comprising; executing a first job and a second job, each comprising a plurality of respective operators that process streaming data, wherein the plurality of respective operators in the first and second jobs are, respectively, interconnected such that data tuples flow between the plurality of respective operators to perform the first and second jobs; establishing an operator graph comprising the plurality of respective operators of both the first and second jobs, the operator graph defining at least one respective execution path through the plurality of respective operators for the first job and for the second job; while the first and second jobs are executing, establishing a connection between the first job and the second job by transmitting a data stream from a first operator of the first job to a second operator of the second job, wherein the first and second jobs are in the operator graph both before and after the connection is established; before establishing the connection between the first job to the second job, setting the data stream as exportable, wherein the plurality of respective operators associated with the second job do not receive data from, or send data to, the plurality of respective operators associated with the first job prior to setting the data stream as exportable; monitoring a performance indicator associated with the first operator of the first job, the performance indicator measuring an effect that the connection between the first and second jobs has on a performance of the first job; and upon determining a value of the performance indicator satisfies a predefined threshold, optimizing the stream computing application to improve the value of the performance indicator. - View Dependent Claims (7, 8, 9, 10)
-
-
11. A computer program product for optimizing a stream computing application comprising:
a non-transitory computer-readable storage medium having computer-readable program code embodied therewith, the computer-readable program code comprising computer-readable program code configured to; execute a first job and a second job, each comprising a plurality of respective operators that process streaming data by operation of one or more computer processors, wherein the plurality of respective operators in the first and second jobs are, respectively, interconnected such that data tuples flow between the plurality of respective operators to perform the first and second jobs; establish an operator graph comprising the plurality of respective operators of both the first and second jobs, the operator graph defining at least one respective execution path through the plurality of respective operators for the first job and for the second job; while the first and second jobs are executing, establish a connection between the first job and the second job by transmitting a data stream from a first operator of the first job to a second operator of the second job, wherein the first and second jobs are in the operator graph both before and after the connection is established; monitor a performance indicator associated with the first operator of the first job by recording at least one of a time when the first job connects to the second job and a number of jobs connected to the data stream of the first job, wherein the performance indicator measures an effect that the connection between the first and second jobs has on a performance of the first job; and upon determining a value of the performance indicator satisfies a predefined threshold, optimize the stream computing application to improve the value of the performance indicator. - View Dependent Claims (12, 13)
Specification