Method for resource optimization for parallel data integration
First Claim
1. A method of optimizing resources for a parallel data integration job, the method comprising:
- receiving a job request specifying a parallel data integration job to deploy in a grid wherein the job request includes operators specifying parallel integration operations performed when the parallel data integration job is run;
predicting grid resource utilizations for hypothetical runs of the specified job on respective hypothetical grid resource configurations responsive to a model based on performance data from a plurality of actual runs of previously deployed, parallel data jobs;
selecting a grid resource configuration for running the parallel data integration job, including automatically selecting a grid resource configuration responsive to the predicted grid resource utilizations and an optimization criterion based on at least one resource utilization index for the job; and
generating the at least one resource utilization index for the job, comprising;
generate resource utilization indices for each respective operator responsive to the predicted grid resource utilizations on resource portions;
generating a respective operator index maximum for each respective operator;
generating, for each of a respective group of the operators, a respective maximum of the operator index maxima among the operators of the respective group;
selecting a first maximum of resource utilization indices for a first and second subset of data source and sink operator groups;
selecting a second maximum of resource utilization indices for a first and second subset of processing and scratch operator groups; and
generating the at least one resource utilization index for the job responsive to a ratio of the first and second maxima.
0 Assignments
0 Petitions
Accused Products
Abstract
For optimizing resources for a parallel data integration job, a job request is received, which specifies a parallel data integration job to deploy in a grid. Grid resource utilizations are predicted for hypothetical runs of the specified job on respective hypothetical grid resource configurations. This includes automatically predicting grid resource utilizations by a resource optimizer module responsive to a model based on a plurality of actual runs of previous jobs. A grid resource configuration is selected for running the parallel data integration job, which includes the optimizer module automatically selecting a grid resource configuration responsive to the predicted grid resource utilizations and an optimization criterion.
-
Citations
8 Claims
-
1. A method of optimizing resources for a parallel data integration job, the method comprising:
-
receiving a job request specifying a parallel data integration job to deploy in a grid wherein the job request includes operators specifying parallel integration operations performed when the parallel data integration job is run; predicting grid resource utilizations for hypothetical runs of the specified job on respective hypothetical grid resource configurations responsive to a model based on performance data from a plurality of actual runs of previously deployed, parallel data jobs; selecting a grid resource configuration for running the parallel data integration job, including automatically selecting a grid resource configuration responsive to the predicted grid resource utilizations and an optimization criterion based on at least one resource utilization index for the job; and generating the at least one resource utilization index for the job, comprising; generate resource utilization indices for each respective operator responsive to the predicted grid resource utilizations on resource portions; generating a respective operator index maximum for each respective operator; generating, for each of a respective group of the operators, a respective maximum of the operator index maxima among the operators of the respective group; selecting a first maximum of resource utilization indices for a first and second subset of data source and sink operator groups; selecting a second maximum of resource utilization indices for a first and second subset of processing and scratch operator groups; and generating the at least one resource utilization index for the job responsive to a ratio of the first and second maxima. - View Dependent Claims (2, 3, 4, 5, 6, 7, 8)
-
Specification