×

Many task computing with distributed file system

  • US 10,650,046 B2
  • Filed: 09/30/2019
  • Issued: 05/12/2020
  • Est. Priority Date: 02/05/2016
  • Status: Active Grant
First Claim
Patent Images

1. An apparatus comprising a processor and a storage to store instructions that, when executed by the processor, cause the processor to perform operations comprising:

  • receive, at the processor and from a remote device, a request to perform a job flow using a flow input data set as an input to the job flow performance, wherein;

    the job flow is defined in a job flow definition that specifies a set of tasks to be performed via execution of a corresponding set of task routines during the job flow performance;

    at least one result report is to be generated as an output of the job flow performance;

    the job flow definition and each task routine of the set of task routines is stored as an undivided object within one storage device of a set of storage devices;

    the flow input data set is either stored as an undivided object within one storage device of the set of storage devices, or stored as a set of data object blocks into which the flow input data set is divided and distributed among the set of storage devices;

    each storage device of the set of storage devices incorporates a processor;

    the processors of the set of storage devices cooperate to maintain a distributed file system that spans storage spaces provided by each storage device of the set of storage devices;

    as part of maintaining the distributed file system, at least one processor of at least one storage device of the set of storage devices determines whether a data object received by the set of storage devices is to be stored as an undivided object or stored as a set of data object blocks into which the received data object is divided and distributed among the set of storage devices based on a size of the received data object compared to a distribution block size; and

    the flow input data set is stored as a set of data object blocks of the flow input data set by the set of storage devices in response to the flow input data set having a size larger than the distribution block size;

    retrieve the job flow definition and each task routine of the set of task routines from the set of storage devices;

    determine whether the flow input data set is stored as an undivided object or as a set of data object blocks based on the size of the flow input data set; and

    in response to a determination that the flow input data set is stored as a set of data objects blocks, the processor is caused to perform operations comprising;

    generate a container that contains the job flow definition and the set of task routines to enable the processor incorporated into each storage device to independently perform an instance of the job flow using one of the data object blocks of the flow input data set stored locally within the storage device as an input to the instance, wherein the performance of an instance of the job flow within each storage device generates a corresponding data object block of a set of data object blocks of the result report;

    provide a copy of the container to each storage device of the set of storage devices to enable the processors incorporated into least two storage devices of the set of storage devices to perform instances of the job flow at least partially in parallel;

    retrieve, from each storage device of the set of storage devices, at least one data object block of the set of data object blocks of the result report;

    assemble the result report from the set of data object blocks of the result report; and

    transmit the result report to the remote device.

View all claims
  • 1 Assignment
Timeline View
Assignment View
    ×
    ×