×

Scheduling MapReduce tasks based on estimated workload distribution

  • US 9,891,950 B2
  • Filed: 01/26/2017
  • Issued: 02/13/2018
  • Est. Priority Date: 08/26/2015
  • Status: Active Grant
First Claim
Patent Images

1. A computer system comprising:

  • one or more computer processors;

    one or more computer-readable storage media;

    program instructions stored on the computer-readable storage media for execution by at least one of the one or more processors, the program instructions comprising instructions to;

    receive a set of task statistics corresponding to task execution within a MapReduce job, wherein the set of task statistics includes a job input size, a cluster size, an average map process rate, a shuffle data size, a network bandwidth, and a convergence of a workload distribution corresponding to the set of executed tasks;

    estimate a completion time corresponding to a map task completion time and a shuffle operation completion time to provide an estimated completion time;

    calculate a soft decision point based on a convergence of a workload distribution corresponding to a set of executed tasks, wherein the soft decision point corresponds to a point at which a workload is most evenly distributed among available resources;

    calculate a hard decision point (HDP) according to the equation HDP=max {0, map task completion time−

    shuffle operation completion time};

    determine a selected decision point based on the soft decision point and the hard decision point, wherein the selected decision point is the lesser of the soft decision point and the hard decision point; and

    schedule and execute a next set of tasks at the selected decision point.

View all claims
  • 1 Assignment
Timeline View
Assignment View
    ×
    ×