×

Stage-aware performance modeling for computer cluster sizing

  • US 9,891,959 B2
  • Filed: 10/30/2015
  • Issued: 02/13/2018
  • Est. Priority Date: 10/30/2015
  • Status: Active Grant
First Claim
Patent Images

1. A computer-implemented method of configuring a computer cluster, the computer implemented method comprising:

  • receiving, by a processor unit, job information identifying a data processing job to be performed, wherein the data processing job to be performed comprises a plurality of stages, and wherein the job information defines characteristics of the plurality of stages that include number of tasks, resource profile, data access pattern, output selectivity, amount of shuffle, resource consumption dynamicity, and data set content sensitivity of respective stages in the plurality of stages of the data processing job;

    receiving, by the processor unit, cluster information identifying a candidate computer cluster;

    identifying, by the processor unit, stage performance models for corresponding to modeled stages having similar characteristics to the characteristics of plurality of stages that include the number of tasks, resource profile, data access pattern, output selectivity, amount of shuffle, resource consumption dynamicity, and data set content sensitivity of the respective stages in the plurality of stages of the data processing job;

    predicting, by the processor unit, stage performance times for performing the plurality of stages on the candidate computer cluster using the stage performance models;

    combining, by the processor unit, the predicted stage performance times to determine a predicted job performance time;

    using, by the processor unit, the predicted job performance time to configure the candidate computer cluster to perform the data processing job; and

    performing, by the candidate computer cluster, the date processing job.

View all claims
  • 1 Assignment
Timeline View
Assignment View
    ×
    ×