Dynamic optimization of workload execution based on statistical data collection and updated job profiling
First Claim
Patent Images
1. A system, comprising:
- a processor; and
a memory storing a program, which, when executed on the processor, performs an operation, the operation comprising;
retrieving, via a processor, a job profile for a processing job, wherein the processing job has a plurality of processing stages specified in an execution profile, wherein the execution profile specifies a respective set of one or more parameters for each of one or more of the processing stages of the processing job, and wherein the job profile includes statistical data for at least one of the processing stages obtained during prior executions of the processing job, wherein the statistical data includes an amount of data that was spilled to a disk during at least one of the one or more processing stages, and wherein the statistical data was obtained by, for each of the processing stages during at least a first of the prior executions;
identifying a plurality of operations in the processing stage, andduring performance of each of the operations;
determining whether a flag is set for the operation, wherein the flag indicates to gather statistical data relating to the operation, andupon determining that a flag is set, collecting the statistical data relating to the operation,identifying a plurality of optimizations to apply to the execution profile based on the statistical data and based on an indication that at least a first one of the processing stages of the processing job was selected for optimization during one of the prior executions,receiving, via a user interface, a selection of at least a first optimization of the identified optimizations to apply to the execution profile, wherein the first optimization comprises a selection of a compression algorithm;
modifying the respective set of one or more parameters of the execution profile for the at least the first one of the processing stages of the processing job by applying the selected first optimization to the at least the first one of the processing stages to optimize an execution environment used to execute the processing job, andexecuting the processing job in the optimized execution environment based on the modified execution profile.
1 Assignment
0 Petitions
Accused Products
Abstract
Embodiments presented herein provide techniques for optimizing parallel data flows of a batch processing job using a profile of the processing job. An application retrieves a job profile for a processing job. The processing job has a plurality of processing stages specified in an execution profile. The job profile includes statistical data for at least one of the processing stages obtained during prior executions of the job. The application modifies properties of the execution profile based on the job profile to optimize the execution of the job. The application executes the processing job with the modified execution profile.
-
Citations
13 Claims
-
1. A system, comprising:
-
a processor; and a memory storing a program, which, when executed on the processor, performs an operation, the operation comprising; retrieving, via a processor, a job profile for a processing job, wherein the processing job has a plurality of processing stages specified in an execution profile, wherein the execution profile specifies a respective set of one or more parameters for each of one or more of the processing stages of the processing job, and wherein the job profile includes statistical data for at least one of the processing stages obtained during prior executions of the processing job, wherein the statistical data includes an amount of data that was spilled to a disk during at least one of the one or more processing stages, and wherein the statistical data was obtained by, for each of the processing stages during at least a first of the prior executions; identifying a plurality of operations in the processing stage, and during performance of each of the operations; determining whether a flag is set for the operation, wherein the flag indicates to gather statistical data relating to the operation, and upon determining that a flag is set, collecting the statistical data relating to the operation, identifying a plurality of optimizations to apply to the execution profile based on the statistical data and based on an indication that at least a first one of the processing stages of the processing job was selected for optimization during one of the prior executions, receiving, via a user interface, a selection of at least a first optimization of the identified optimizations to apply to the execution profile, wherein the first optimization comprises a selection of a compression algorithm; modifying the respective set of one or more parameters of the execution profile for the at least the first one of the processing stages of the processing job by applying the selected first optimization to the at least the first one of the processing stages to optimize an execution environment used to execute the processing job, and executing the processing job in the optimized execution environment based on the modified execution profile. - View Dependent Claims (2, 3, 4, 5, 6)
-
-
7. A computer program product, comprising:
-
a non-transitory computer-readable storage medium having computer-readable program code embodied therewith, the computer-readable program code configured to perform an operation, the operation comprising; retrieving, via a processor, a job profile for a processing job, wherein the processing job has a plurality of processing stages specified in an execution profile, wherein the execution profile specifies a respective set of one or more parameters for each of one or more of the processing stages of the processing job, and wherein the job profile includes statistical data for at least one of the processing stages obtained during a prior execution of the processing job, wherein the statistical data includes an amount of data that was spilled to a disk during at least one of the one or processing stages, and wherein the statistical data was obtained by, for each of the processing stages during at least a first of the prior executions; identifying a plurality of operations in the processing stage, and during performance of each of the operations; determining whether a flag is set for the operation, wherein the flag indicates to gather statistical data relating to the operation, and upon determining that a flag is set, collecting the statistical data relating to the operation; identifying a plurality of optimizations to apply to the execution profile based on the statistical data and based on an indication that at least a first one of the processing stages of the processing job was selected for optimization during one of the prior executions, receiving, via a user interface, a selection of at least a first optimization of the identified optimizations to apply to the execution profile, wherein the first optimization comprises a selection of a compression algorithm; modifying the respective set of one or more parameters of the execution profile for the at least the first one of the processing stages of the processing job by applying the selected first optimization to the at least the first one of the processing stages to optimize an execution environment used to execute the processing job; and executing the processing job in the optimized execution environment based on the modified execution profile. - View Dependent Claims (8, 9, 10, 11, 12, 13)
-
Specification