×

Management of intermediate data spills during the shuffle phase of a map-reduce job

  • US 9,740,706 B2
  • Filed: 06/21/2016
  • Issued: 08/22/2017
  • Est. Priority Date: 06/03/2013
  • Status: Active Grant
First Claim
Patent Images

1. A distributed computer system configured for spill management during a shuffle phase of a map-reduce job performed by said distributed computer system on distributed files, said distributed computer system comprising:

  • (a) key-value pairs (ki,vi) belonging to said distributed files on which said map-reduce job is performed;

    (b) a number of map nodes for performing a pre-shuffle phase of said map-reduce job on said key-value pairs (ki,vi) to generate keyed partitions (Ki,PRTj);

    (c) storage resources for spilling said keyed partitions (Ki,PRTj), said spilling managed by a spilling protocol utilizing at least one popularity attribute of said key-value pairs (ki,vi);

    (d) said popularity attribute of said key-value pairs (ki,vi) determined in accordance with at least one element selected from the group consisting of relevance ranking of said key-value pairs (ki,vi) to a topic of interest, number of times that said key-value pairs (ki,vi) are used in computations and level of trust of data sources from which said key-value pairs (ki,vi) were obtained;

    (e) a number of reduce nodes provided with said spilling protocol to enable said reduce nodes to locate and access said keyed partitions (Ki,PRTj) during said shuffle phase by utilizing a path to said keyed partitions (Ki,PRTj);

    wherein said distributed computer system executes a post-shuffle phase of said map-reduce job to produce an output of said map-reduce job.

View all claims
  • 1 Assignment
Timeline View
Assignment View
    ×
    ×