×

Automated load-balancing of partitions in arbitrarily imbalanced distributed mapreduce computations

  • US 9,613,127 B1
  • Filed: 06/30/2014
  • Issued: 04/04/2017
  • Est. Priority Date: 06/30/2014
  • Status: Active Grant
First Claim
Patent Images

1. A computer-implemented method of load-balancing in arbitrarily imbalanced MapReduce job in a distributed computing system, the method comprising:

  • identifying K data keys with the highest frequency among received data, the received data comprising pairings of data keys and data values to be processed in the MapReduce job;

    assigning one respective reduce phase worker to process data values corresponding to the data keys of each multiple-key bucket, each multiple-key bucket comprising queued data items having several different keys;

    assigning data for each of the K data keys to a single-key bucket and other data keys to multiple-key buckets;

    assigning multiple reduce phase workers to process data values corresponding to the data key of each single-key bucket, wherein a number of multiple reduce phase workers to assign to a single-key bucket is determined according to a respective frequency of the data key assigned to the single-key bucket and a threshold level of acceptable imbalance in reduce phase worker loads across reduce phase workers; and

    stitching together output of the assigned multiple reduce phase workers on each respective single-key bucket.

View all claims
  • 9 Assignments
Timeline View
Assignment View
    ×
    ×