×

Resource optimization for parallel data integration

  • US 8,935,702 B2
  • Filed: 09/04/2009
  • Issued: 01/13/2015
  • Est. Priority Date: 09/04/2009
  • Status: Expired due to Fees
First Claim
Patent Images

1. A program product for optimizing a parallel data integration job, the program product comprising:

  • a nontransitory computer readable storage medium having computer readable program code embodied therewith, the computer readable program code comprising;

    computer readable code configured to receive a job request specifying a parallel data integration job to deploy in a grid, wherein the job request includes operators specifying parallel integration operations performed when the parallel data integration job is run;

    computer readable code configured to predict grid resource utilizations for hypothetical runs of the specified job on respective hypothetical grid resource configurations responsive to a model based on performance data from a plurality of actual runs of previously deployed, parallel data jobs; and

    computer readable code configured to select a grid resource configuration for running the parallel data integration job, including resource optimizer module computer readable code configured to automatically select a grid resource configuration responsive to the predicted grid resource utilizations and an optimization criterion based on at least one resource utilization index for the job; and

    computer readable code configured to generate the at least one resource utilization index for the job, comprising;

    computer readable code configured to generate resource utilization indices for each respective operator responsive to the predicted grid resource utilizations on resource portions;

    computer readable code configured to generate a respective operator index maximum for each respective operator;

    computer readable code configured to generate, for each of a respective group of the operators, a respective maximum of the operator index maxima among the operators of the respective group;

    computer readable code configured to select a first maximum of resource utilization indices for a first and second subset of data source and sink operator groups;

    computer readable code configured to select a second maximum of resource utilization indices for a first and second subset of processing and scratch operator groups; and

    computer readable code configured to generate the at least one resource utilization index for the job responsive to a ratio of the first and second maxima.

View all claims
  • 1 Assignment
Timeline View
Assignment View
    ×
    ×