×

Method for shard assignment in a large-scale data processing job

  • US 9,298,760 B1
  • Filed: 08/03/2012
  • Issued: 03/29/2016
  • Est. Priority Date: 08/03/2012
  • Status: Active Grant
First Claim
Patent Images

1. A system for shard assignment in a distributed data processing system, the system comprising:

  • one or more processing devices;

    one or more storage devices storing instructions that, when executed by the one or more processing devices, cause the one or more processing devices to implement;

    a plurality of worker processes anda master process for coordinating a data processing job that;

    divides an input dataset into a plurality of shards;

    indexes the plurality of shards;

    aggregates the plurality of shards into one or more groups based on the shards'"'"' indices;

    initially assigns an indexed shard from each group to a worker process; and

    in response to a worker having processed its initially assigned indexed shard, assigns subsequent shards from the same group as the initially assigned shard to the worker process based on the index of the previously-assigned shard.

View all claims
  • 3 Assignments
Timeline View
Assignment View
    ×
    ×