×

Many task computing with message passing interface

  • US 10,657,107 B1
  • Filed: 12/29/2019
  • Issued: 05/19/2020
  • Est. Priority Date: 02/05/2016
  • Status: Active Grant
First Claim
Patent Images

1. An apparatus comprising a processor and a storage to store instructions that, when executed by the processor, cause the processor to perform operations comprising:

  • receive, at the processor, a first request to store a flow input data set in a federated area, wherein;

    at least one federated area is defined within storage space provided by at least one of a set of storage devices to store objects to perform a job flow;

    the objects to perform the job flow comprise a job flow definition that defines the job flow as a set of tasks to be performed, and a corresponding set of task routines to perform the set of tasks;

    processors associated with the set of storage devices cooperate to maintain a distributed file system as spanning storage spaces provided by each storage device of the set of storage devices; and

    as part of maintaining the distributed file system, at least one processor associated with of at least one storage device of the set of storage devices determines whether a data object received by the set of storage devices is to be stored as an undivided object or stored as a set of data object blocks into which the received data object is divided and distributed among the set of storage devices based on a size of the received data object compared to a distribution block size;

    compare a size of the flow input data set to a threshold size that is based on the distribution block size to determine whether the size of the flow input data set is larger than the threshold size; and

    in response to a determination that the size of the flow input data set is larger than the threshold size, the processor is caused to perform operations comprising;

    analyze the flow input data set to determine whether the flow input data set is of a distributable form in which data items of the flow input data set are organized into a single homogeneous data structure such that, after the flow input data set is divided into a set of data object blocks, the data items remain accessible from each data object block of the flow input data set independently of the other data object blocks of the flow input data set;

    in response to a determination that the flow input data set is not of the distributable form of the flow input data set, the processor is caused to perform operations comprising;

    convert the flow input data set from an original form and into the distributable form of the flow input data set; and

    following conversion of the original form of the flow input data set into the distributable form, provide the distributable form of the flow input data set to the set of storage devices to be divided by the set of storage devices into the set of data object blocks of the flow input data set that are to be stored in a distributed manner within a first federated area of the at least one federated area, wherein the first federated area is defined within the distributed file system.

View all claims
  • 1 Assignment
Timeline View
Assignment View
    ×
    ×