×

DISTRIBUTED DATA SET STORAGE AND RETRIEVAL

  • US 20170031599A1
  • Filed: 07/26/2016
  • Published: 02/02/2017
  • Est. Priority Date: 07/27/2015
  • Status: Active Grant
First Claim
Patent Images

1. An apparatus comprising a processor component and a storage to store instructions that, when executed by the processor component, cause the processor component to perform operations comprising:

  • retrieve, from one or more storage devices through a network, metadata indicative of organization of data within a data set, and map data indicative of organization of multiple data blocks within a data file maintained by the one or more storage devices, wherein;

    the map data comprises multiple map entries; and

    each map entry of the multiple map entries corresponds to one or more data blocks of the multiple data blocks;

    receive, from multiple node devices, indications of which node devices among the multiple node devices are available node devices that are each able to perform a processing task with at least one data set portion of the one or more data set portions; and

    in response to an indication within the metadata or the map data that the data set comprises partitioned data wherein the data within the data set is organized into multiple partitions that are each distributable to a single node device, and each map entry corresponds to a single data block;

    determine a first quantity of the available node devices based on the indications of which node devices are available node devices;

    retrieve a second quantity of node devices last involved in storage of the data set within the data file from the metadata or the map data;

    compare the first and second quantities of node devices to detect a match between the first and second quantities;

    assign each of the available node devices one of a series of positive integer values as a designation value, wherein the series extends from an integer value of 0 to a positive integer value equal to the first quantity minus the integer value of 1; and

    in response to detection of a match between the first and second quantities, for each map entry of the map data;

    retrieve, from the map entry, a hashed identifier for one data sub-block indicated in the map entry as within the corresponding data block, and a data sub-block size for each of the data sub-blocks indicated in the map entry as within the corresponding data block, wherein;

    the hashed identifier is derived from a partition label of a partition of the multiple partitions; and

    the data sub-block comprises a data set portion of the one or more data set portions;

    determine a location of the corresponding data block within the data file;

    divide the hashed identifier by the first quantity to obtain a modulo value;

    compare the modulo value to the designation value assigned to each of the available node devices to identify an available node device assigned a designation value that matches the modulo value; and

    provide a pointer to the available node device assigned the designation value that matches the modulo value, the pointer comprising;

    an indication of the location of the corresponding data block; and

    a sum of the data sub-block sizes of all of the data sub-blocks within the corresponding data block.

View all claims
  • 1 Assignment
Timeline View
Assignment View
    ×
    ×