×

Management of intermediate data spills during the shuffle phase of a map-reduce job

  • US 9,424,274 B2
  • Filed: 06/03/2013
  • Issued: 08/23/2016
  • Est. Priority Date: 06/03/2013
  • Status: Active Grant
First Claim
Patent Images

1. A distributed computer system configured for spill management during a shuffle phase of a map-reduce job performed by said distributed computer system on distributed files, said distributed computer system comprising:

  • a) key-value pairs (ki,vi) belonging to said distributed files on which said map-reduce job is performed;

    b) a number of map nodes for performing a pre-shuffle phase of said map-reduce job on said key value pairs (ki,vi) to generate keyed partitions (Ki,PRTj);

    c) storage resources for spilling said keyed partitions (Ki,PRTj) in accordance with a spilling protocol based on at least one popularity attribute of said key-value pairs (ki,vi);

    d) a number of reduce nodes provided with said spilling protocol to enable said reduce nodes to locate and access said keyed partitions (Ki,PRTj) during said shuffle phase by utilizing a path to said keyed partitions (Ki,PRTj), said path sent in the header of an empty HTTP message;

    e) said keyed partitions (Ki,PRTj) stored in a shared directory under a mount point, said shared directory accessible by said map nodes and said reduce nodes;

    whereinsaid distributed computer system executes a post-shuffle phase of said map-reduce job to produce an output list of said map-reduce job.

View all claims
  • 1 Assignment
Timeline View
Assignment View
    ×
    ×