Resource optimization for parallel data integration
First Claim
1. A program product for optimizing a parallel data integration job, the program product comprising:
- a nontransitory computer readable storage medium having computer readable program code embodied therewith, the computer readable program code comprising;
computer readable code configured to receive a job request specifying a parallel data integration job to deploy in a grid, wherein the job request includes operators specifying parallel integration operations performed when the parallel data integration job is run;
computer readable code configured to predict grid resource utilizations for hypothetical runs of the specified job on respective hypothetical grid resource configurations responsive to a model based on performance data from a plurality of actual runs of previously deployed, parallel data jobs; and
computer readable code configured to select a grid resource configuration for running the parallel data integration job, including resource optimizer module computer readable code configured to automatically select a grid resource configuration responsive to the predicted grid resource utilizations and an optimization criterion based on at least one resource utilization index for the job; and
computer readable code configured to generate the at least one resource utilization index for the job, comprising;
computer readable code configured to generate resource utilization indices for each respective operator responsive to the predicted grid resource utilizations on resource portions;
computer readable code configured to generate a respective operator index maximum for each respective operator;
computer readable code configured to generate, for each of a respective group of the operators, a respective maximum of the operator index maxima among the operators of the respective group;
computer readable code configured to select a first maximum of resource utilization indices for a first and second subset of data source and sink operator groups;
computer readable code configured to select a second maximum of resource utilization indices for a first and second subset of processing and scratch operator groups; and
computer readable code configured to generate the at least one resource utilization index for the job responsive to a ratio of the first and second maxima.
1 Assignment
0 Petitions
Accused Products
Abstract
For optimizing resources for a parallel data integration job, a job request is received, which specifies a parallel data integration job to deploy in a grid. Grid resource utilizations are predicted for hypothetical runs of the specified job on respective hypothetical grid resource configurations. This includes automatically predicting grid resource utilizations by a resource optimizer module responsive to a model based on a plurality of actual runs of previous jobs. A grid resource configuration is selected for running the parallel data integration job, which includes the optimizer module automatically selecting a grid resource configuration responsive to the predicted grid resource utilizations and an optimization criterion.
-
Citations
11 Claims
-
1. A program product for optimizing a parallel data integration job, the program product comprising:
-
a nontransitory computer readable storage medium having computer readable program code embodied therewith, the computer readable program code comprising; computer readable code configured to receive a job request specifying a parallel data integration job to deploy in a grid, wherein the job request includes operators specifying parallel integration operations performed when the parallel data integration job is run; computer readable code configured to predict grid resource utilizations for hypothetical runs of the specified job on respective hypothetical grid resource configurations responsive to a model based on performance data from a plurality of actual runs of previously deployed, parallel data jobs; and computer readable code configured to select a grid resource configuration for running the parallel data integration job, including resource optimizer module computer readable code configured to automatically select a grid resource configuration responsive to the predicted grid resource utilizations and an optimization criterion based on at least one resource utilization index for the job; and computer readable code configured to generate the at least one resource utilization index for the job, comprising; computer readable code configured to generate resource utilization indices for each respective operator responsive to the predicted grid resource utilizations on resource portions; computer readable code configured to generate a respective operator index maximum for each respective operator; computer readable code configured to generate, for each of a respective group of the operators, a respective maximum of the operator index maxima among the operators of the respective group; computer readable code configured to select a first maximum of resource utilization indices for a first and second subset of data source and sink operator groups; computer readable code configured to select a second maximum of resource utilization indices for a first and second subset of processing and scratch operator groups; and computer readable code configured to generate the at least one resource utilization index for the job responsive to a ratio of the first and second maxima. - View Dependent Claims (2, 3, 4, 5, 6)
-
-
7. A computer system comprising:
-
at least one storage system for storing a parallel data integration job resource optimization program; and at least one processor for processing the parallel data integration job resource optimization program, the system being configured with the program and the processor to; receive a job request specifying a parallel data integration job to deploy in a grid, wherein the job request includes operators specifying parallel integration operations performed when the parallel data integration job is run; predict grid resource utilizations for hypothetical runs of the specified job on respective hypothetical grid resource configurations responsive to a model based on a performance data from plurality of actual runs of previously deployed, parallel data jobs; select a grid resource configuration for running the parallel data integration job, including an optimizer module automatically selecting a grid resource configuration responsive to the predicted grid resource utilizations and an optimization criterion based on at least one resource utilization index for the job; and generate the at least one resource utilization index for the job, comprising; generate resource utilization indices for each respective operator responsive to the predicted grid resource utilizations on resource portions; generate a respective operator index maximum for each respective operator; generate, for each of a respective group of the operators, a respective maximum of the operator index maxima among the operators of the respective group; select a first maximum of resource utilization indices for a first and second subset of data source and sink operator groups; select a second maximum of resource utilization indices for a first and second subset of processing and scratch operator groups; and generate the at least one resource utilization index for the job responsive to a ratio of the first and second maxima. - View Dependent Claims (8, 9, 10, 11)
-
Specification