Distributed data set storage and retrieval
First Claim
1. An apparatus comprising a processor component and a storage to store instructions that, when executed by the processor component, cause the processor component to perform operations comprising:
- provide, to a control device, an indication of being currently available to participate in a performance of a processing task as a node device among multiple node devices;
receive, from the control device, an indication of the processing task to perform with one or more data set portions of multiple data set portions of a data set, wherein the data set comprises data organized in a manner indicated in metadata;
perform the processing task with the one or more data set portions;
provide a request to the control device for a pointer to a location at which to store the one or more data set portions as a data block of multiple data blocks within a data file maintained by one or more storage devices, wherein;
the multiple data blocks are organized within the data file in a manner indicated in map data that comprises multiple map entries; and
each map entry of the multiple map entries corresponds to one or more data blocks of the multiple data blocks;
analyze the metadata to determine whether the metadata indicates that the data set is partitioned;
in response to an indication in the metadata that the data set comprises partitioned data, wherein the data within the data set is organized into multiple partitions that are each distributable to just a single node device, each map entry corresponds to a single data block and the processing task is able to be performed with the data within each partition independently of the data within any other partition of the multiple partitions, the processor component is caused to perform operations comprising;
for each data set portion of the one or more data set portions;
include a data sub-block size indicative of a size of the data set portion in the request;
derive a hashed identifier of a partition label of the partition to which the data set portion belongs of the multiple partitions; and
include the hashed identifier in the request;
receive, from the control device, the requested pointer indicating the location within the data file at which to store the data block; and
after the performance of the processing task with the one or more data set portions, store each data set portion of the one or more data set portions as a data sub-block within the data block starting at the location within the data file; and
in response to a lack of indication in the metadata that the data set comprises partitioned data, the processor component is caused to perform operations comprising;
derive a sum of sizes each data set portion of the one or more data set portions;
include the sum of sizes as a data block size of the data block in the request;
receive, from the control device, the requested pointer indicating the location within the data file at which to store the data block; and
after the performance of the processing task with the one or more data set portions, store the one or more data set portions together as the data block at the location within the data file.
0 Assignments
0 Petitions
Accused Products
Abstract
An apparatus comprising a processor component to: provide, to a control device, an indication of availability to perform a processing task with one or more data set portions as a node device; perform a processing task specified by the control device with the one or more data set portions; and request a pointer to a location at which to store the one or more data set portions as a data block within a data file. In response to the data set including partitioned data, for each data set portion, include a data sub-block size of the data set portion and a hashed identifier derived from a partition label of a partition in the request; receive, from the control device, the requested pointer to the location; and store each data set portion as a data sub-block within the data block starting at the location within the data file.
4 Citations
30 Claims
-
1. An apparatus comprising a processor component and a storage to store instructions that, when executed by the processor component, cause the processor component to perform operations comprising:
-
provide, to a control device, an indication of being currently available to participate in a performance of a processing task as a node device among multiple node devices; receive, from the control device, an indication of the processing task to perform with one or more data set portions of multiple data set portions of a data set, wherein the data set comprises data organized in a manner indicated in metadata; perform the processing task with the one or more data set portions; provide a request to the control device for a pointer to a location at which to store the one or more data set portions as a data block of multiple data blocks within a data file maintained by one or more storage devices, wherein; the multiple data blocks are organized within the data file in a manner indicated in map data that comprises multiple map entries; and each map entry of the multiple map entries corresponds to one or more data blocks of the multiple data blocks; analyze the metadata to determine whether the metadata indicates that the data set is partitioned; in response to an indication in the metadata that the data set comprises partitioned data, wherein the data within the data set is organized into multiple partitions that are each distributable to just a single node device, each map entry corresponds to a single data block and the processing task is able to be performed with the data within each partition independently of the data within any other partition of the multiple partitions, the processor component is caused to perform operations comprising; for each data set portion of the one or more data set portions; include a data sub-block size indicative of a size of the data set portion in the request; derive a hashed identifier of a partition label of the partition to which the data set portion belongs of the multiple partitions; and include the hashed identifier in the request; receive, from the control device, the requested pointer indicating the location within the data file at which to store the data block; and after the performance of the processing task with the one or more data set portions, store each data set portion of the one or more data set portions as a data sub-block within the data block starting at the location within the data file; and in response to a lack of indication in the metadata that the data set comprises partitioned data, the processor component is caused to perform operations comprising; derive a sum of sizes each data set portion of the one or more data set portions; include the sum of sizes as a data block size of the data block in the request; receive, from the control device, the requested pointer indicating the location within the data file at which to store the data block; and after the performance of the processing task with the one or more data set portions, store the one or more data set portions together as the data block at the location within the data file. - View Dependent Claims (2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14)
-
-
15. A computer-program product tangibly embodied in a non-transitory machine-readable storage medium, the computer-program product including instructions operable to cause a processor component to perform operations comprising:
-
provide, to a control device, an indication of being currently available to participate in a performance of a processing task as a node device among multiple node devices; receive, from the control device, an indication of the processing task to perform with one or more data set portions of multiple data set portions of a data set, wherein the data set comprises data organized in a manner indicated in metadata; perform the processing task with the one or more data set portions; provide a request to the control device for a pointer to a location at which to store the one or more data set portions as a data block of multiple data blocks within a data file maintained by one or more storage devices, wherein; the multiple data blocks are organized within the data file in a manner indicated in map data that comprises multiple map entries; and each map entry of the multiple map entries corresponds to one or more data blocks of the multiple data blocks; analyze the metadata to determine whether the metadata indicates that the data set is partitioned; in response to an indication in the metadata that the data set comprises partitioned data, wherein the data within the data set is organized into multiple partitions that are each distributable to just a single node device, each map entry corresponds to a single data block and the processing task is able to be performed with the data within each partition independently of the data within any other partition of the multiple partitions, the processor component is caused to perform operations comprising; for each data set portion of the one or more data set portions; include a data sub-block size indicative of a size of the data set portion in the request; derive a hashed identifier of a partition label of the partition to which the data set portion belongs of the multiple partitions; and include the hashed identifier in the request; receive, from the control device, the requested pointer indicating the location within the data file at which to store the data block; and after the performance of the processing task with the one or more data set portions, store each data set portion of the one or more data set portions as a data sub-block within the data block starting at the location within the data file; and in response to a lack of indication in the metadata that the data set comprises partitioned data, the processor component is caused to perform operations comprising; derive a sum of sizes each data set portion of the one or more data set portions; include the sum of sizes as a data block size of the data block in the request; receive, from the control device, the requested pointer indicating the location within the data file at which to store the data block; and after the performance of the processing task with the one or more data set portions, store the one or more data set portions together as the data block at the location within the data file. - View Dependent Claims (16, 17, 18, 19, 20, 21, 22)
-
-
23. A computer-implemented method comprising:
-
providing, to a control device, an indication of being currently available to participate in a performance of a processing task as a node device among multiple node devices; receiving, from the control device, an indication of the processing task to perform with one or more data set portions of multiple data set portions of a data set, wherein the data set comprises data organized in a manner indicated in metadata; performing the processing task with the one or more data set portions; providing a request to the control device for a pointer to a location at which to store the one or more data set portions as a data block of multiple data blocks within a data file maintained by one or more storage devices, wherein; the multiple data blocks are organized within the data file in a manner indicated in map data that comprises multiple map entries; and each map entry of the multiple map entries corresponds to one or more data blocks of the multiple data blocks; analyzing the metadata to determine whether the metadata indicates that the data set is partitioned; in response to an indication in the metadata that the data set comprises partitioned data, wherein the data within the data set is organized into multiple partitions that are each distributable to just a single node device, each map entry corresponds to a single data block and the processing task is able to be performed with the data within each partition independently of the data within any other partition of the multiple partitions, the method comprises; for each data set portion of the one or more data set portions; including, in the request, a data sub-block size indicative of a size of the data set portion; derive a hashed identifier of a partition label of the partition to which the data set portion belongs of the multiple partitions; and including, in the request, the hashed identifier; receiving, from the control device, the requested pointer indicating the location within the data file at which to store the data block; and after performing the processing task with the one or more data set portions, storing each data set portion of the one or more data set portions as a data sub-block within the data block starting at the location within the data file; and in response to a lack of indication in the metadata that the data set comprises partitioned data; deriving a sum of sizes each data set portion of the one or more data set portions; including the sum of sizes as a data block size of the data block in the request receiving, from the control device, the requested pointer indicating the location within the data file at which to store the data block; and after performing the processing task with the one or more data set portions, storing the one or more data set portions together as the data block at the location within the data file. - View Dependent Claims (24, 25, 26, 27, 28, 29, 30)
-
Specification