Many task computing with message passing interface
First Claim
1. An apparatus comprising a processor and a storage to store instructions that, when executed by the processor, cause the processor to perform operations comprising:
- receive, at the processor, a first request to store a flow input data set in a federated area, wherein;
at least one federated area is defined within storage space provided by at least one of a set of storage devices to store objects to perform a job flow;
the objects to perform the job flow comprise a job flow definition that defines the job flow as a set of tasks to be performed, and a corresponding set of task routines to perform the set of tasks;
processors associated with the set of storage devices cooperate to maintain a distributed file system as spanning storage spaces provided by each storage device of the set of storage devices; and
as part of maintaining the distributed file system, at least one processor associated with of at least one storage device of the set of storage devices determines whether a data object received by the set of storage devices is to be stored as an undivided object or stored as a set of data object blocks into which the received data object is divided and distributed among the set of storage devices based on a size of the received data object compared to a distribution block size;
compare a size of the flow input data set to a threshold size that is based on the distribution block size to determine whether the size of the flow input data set is larger than the threshold size; and
in response to a determination that the size of the flow input data set is larger than the threshold size, the processor is caused to perform operations comprising;
analyze the flow input data set to determine whether the flow input data set is of a distributable form in which data items of the flow input data set are organized into a single homogeneous data structure such that, after the flow input data set is divided into a set of data object blocks, the data items remain accessible from each data object block of the flow input data set independently of the other data object blocks of the flow input data set;
in response to a determination that the flow input data set is not of the distributable form of the flow input data set, the processor is caused to perform operations comprising;
convert the flow input data set from an original form and into the distributable form of the flow input data set; and
following conversion of the original form of the flow input data set into the distributable form, provide the distributable form of the flow input data set to the set of storage devices to be divided by the set of storage devices into the set of data object blocks of the flow input data set that are to be stored in a distributed manner within a first federated area of the at least one federated area, wherein the first federated area is defined within the distributed file system.
1 Assignment
0 Petitions
Accused Products
Abstract
An apparatus includes a processor to: receive a request from a remote device to perform a job flow; retrieve a job flow definition defining the job flow and each of a set of task routines to perform tasks of the job flow from a set of storage devices where each is stored as an undivided object within one storage device; and in response to determining that a data set is stored as multiple data object blocks, generate a container containing the job flow definition and set of task routines to enable each storage device to perform the job flow using a locally stored data object block of the data set as input to generate a corresponding data object block of a result report, provide a copy of the container to each storage device, and transmit the result report assembled from the data object blocks thereof to the remote device.
-
Citations
30 Claims
-
1. An apparatus comprising a processor and a storage to store instructions that, when executed by the processor, cause the processor to perform operations comprising:
-
receive, at the processor, a first request to store a flow input data set in a federated area, wherein; at least one federated area is defined within storage space provided by at least one of a set of storage devices to store objects to perform a job flow; the objects to perform the job flow comprise a job flow definition that defines the job flow as a set of tasks to be performed, and a corresponding set of task routines to perform the set of tasks; processors associated with the set of storage devices cooperate to maintain a distributed file system as spanning storage spaces provided by each storage device of the set of storage devices; and as part of maintaining the distributed file system, at least one processor associated with of at least one storage device of the set of storage devices determines whether a data object received by the set of storage devices is to be stored as an undivided object or stored as a set of data object blocks into which the received data object is divided and distributed among the set of storage devices based on a size of the received data object compared to a distribution block size; compare a size of the flow input data set to a threshold size that is based on the distribution block size to determine whether the size of the flow input data set is larger than the threshold size; and in response to a determination that the size of the flow input data set is larger than the threshold size, the processor is caused to perform operations comprising; analyze the flow input data set to determine whether the flow input data set is of a distributable form in which data items of the flow input data set are organized into a single homogeneous data structure such that, after the flow input data set is divided into a set of data object blocks, the data items remain accessible from each data object block of the flow input data set independently of the other data object blocks of the flow input data set; in response to a determination that the flow input data set is not of the distributable form of the flow input data set, the processor is caused to perform operations comprising; convert the flow input data set from an original form and into the distributable form of the flow input data set; and following conversion of the original form of the flow input data set into the distributable form, provide the distributable form of the flow input data set to the set of storage devices to be divided by the set of storage devices into the set of data object blocks of the flow input data set that are to be stored in a distributed manner within a first federated area of the at least one federated area, wherein the first federated area is defined within the distributed file system. - View Dependent Claims (2, 3, 4, 5, 6, 7, 8, 9, 10)
-
-
11. A computer-program product tangibly embodied in a non-transitory machine-readable storage medium, the computer-program product including instructions operable to cause a processor to perform operations comprising:
-
receive, at the processor, a first request to store a flow input data set in a federated area, wherein; at least one federated area is defined within storage space provided by at least one of a set of storage devices to store objects to perform a job flow; the objects to perform the job flow comprise a job flow definition that defines the job flow as a set of tasks to be performed, and a corresponding set of task routines to perform the set of tasks; processors associated with the set of storage devices cooperate to maintain a distributed file system as spanning storage spaces provided by each storage device of the set of storage devices; and as part of maintaining the distributed file system, at least one processor associated with of at least one storage device of the set of storage devices determines whether a data object received by the set of storage devices is to be stored as an undivided object or stored as a set of data object blocks into which the received data object is divided and distributed among the set of storage devices based on a size of the received data object compared to a distribution block size; compare a size of the flow input data set to a threshold size that is based on the distribution block size to determine whether the size of the flow input data set is larger than the threshold size; and in response to a determination that the size of the flow input data set is larger than the threshold size, the processor is caused to perform operations comprising; analyze the flow input data set to determine whether the flow input data set is of a distributable form in which data items of the flow input data set are organized into a single homogeneous data structure such that, after the flow input data set is divided into a set of data object blocks, the data items remain accessible from each data object block of the flow input data set independently of the other data object blocks of the flow input data set; in response to a determination that the flow input data set is not of the distributable form of the flow input data set, the processor is caused to perform operations comprising; convert the flow input data set from an original form and into the distributable form of the flow input data set; and following conversion of the original form of the flow input data set into the distributable form, provide the distributable form of the flow input data set to the set of storage devices to be divided by the set of storage devices into the set of data object blocks of the flow input data set that are to be stored in a distributed manner within a first federated area of the at least one federated area, wherein the first federated area is defined within the distributed file system. - View Dependent Claims (12, 13, 14, 15, 16, 17, 18, 19, 20)
-
-
21. A computer-implemented method comprising:
-
receiving, by a processor, a first request to store a flow input data set in a federated area, wherein; at least one federated area is defined within storage space provided by at least one of a set of storage devices to store objects to perform a job flow; the objects to perform the job flow comprise a job flow definition that defines the job flow as a set of tasks to be performed, and a corresponding set of task routines to perform the set of tasks; processors associated with the set of storage devices cooperate to maintain a distributed file system as spanning storage spaces provided by each storage device of the set of storage devices; and as part of maintaining the distributed file system, at least one processor associated with of at least one storage device of the set of storage devices determines whether a data object received by the set of storage devices is to be stored as an undivided object or stored as a set of data object blocks into which the received data object is divided and distributed among the set of storage devices based on a size of the received data object compared to a distribution block size; comparing, by the processor, a size of the flow input data set to a threshold size that is based on the distribution block size to determine whether the size of the flow input data set is larger than the threshold size; and in response to a determination that the size of the flow input data set is larger than the threshold size, performing operations comprising; analyzing, by the processor, the flow input data set to determine whether the flow input data set is of a distributable form in which data items of the flow input data set are organized into a single homogeneous data structure such that, after the flow input data set is divided into a set of data object blocks, the data items remain accessible from each data object block of the flow input data set independently of the other data object blocks of the flow input data set; in response to a determination that the flow input data set is not of the distributable form of the flow input data set, performing operations comprising; converting, by the processor, the flow input data set from an original form and into the distributable form of the flow input data set; and following conversion of the original form of the flow input data set into the distributable form, providing the distributable form of the flow input data set to the set of storage devices to be divided by the set of storage devices into the set of data object blocks of the flow input data set that are to be stored in a distributed manner within a first federated area of the at least one federated area, wherein the first federated area is defined within the distributed file system. - View Dependent Claims (22, 23, 24, 25, 26, 27, 28, 29, 30)
-
Specification