Distributed data set storage and analysis reproducibility
First Claim
1. An apparatus comprising a processor and a storage to store instructions that, when executed by the processor, cause the processor to perform operations comprising:
- maintain, within one or more storage devices, a federated area to store multiple data sets, multiple job flow definitions, multiple task routines, multiple result reports and multiple instance logs;
provide, on a network, a portal to control access by a remote device to the federated area via the network;
receive, at the portal, and from the remote device via the network, a first request to execute at least one task routine stored in the federated area to perform at least one corresponding task of a job flow specified in a job flow definition stored in the federated area with at least one data set stored in the federated area, wherein the first request specifies the job flow definition and the at least one data set;
retrieve the job flow definition from among the multiple job flow definitions stored in the federated area;
retrieve the at least one data set from among the multiple data sets stored in the federated area;
determine whether there are one or more instance logs among the multiple instance logs stored in the federated area that were each generated by a previous performance of the at least one task of the job flow with the at least one data set;
in response to a determination that there is just one instance log among the multiple instance logs stored in the federated area that was generated by a previous performance of the at least one task of the job flow with the at least one data set, the processor is caused to perform operations comprising;
retrieve, from among the multiple task routines stored in the federated area, a version specified by the one instance log of each task routine of the at least one task routine;
execute the retrieved version of each task routine of the at least one task routine to perform the at least one corresponding task of the job flow with the at least one data set to generate a new result report and a new instance log;
store the new result report among the multiple result reports in the federated area;
store the new instance log among the multiple instance logs in the federated area; and
provide access to the new result report to the remote device via the network;
in response to a determination that there is no instance log among the multiple instance logs stored in the federated area that was generated by a previous performance of the at least one task of the job flow with the at least one data set, the processor is caused to perform operations comprising;
retrieve, from among the multiple task routines stored in the federated area, a most recent version of each task routine of the at least one task routine;
execute the most recent version of each task routine of the at least one task routine to perform the at least one corresponding task of the job flow with the at least one data set to generate a new result report and a new instance log;
store the new result report among the multiple result reports in the federated area;
store the new instance log among the multiple instance logs in the federated area; and
provide access to the new result report to the remote device via the network; and
in response to a determination that there is more than one instance log among the multiple instance logs that were each generated by a previous performance of the at least one task of the job flow with the at least one data set, the processor is caused to provide, via the network, an indication of the more than one instance log to the remote device.
1 Assignment
0 Petitions
Accused Products
Abstract
An apparatus includes a processor and a storage storing instructions causing the processor to: maintain a federated area; receive a request to perform a job flow with a data set from a remote device; retrieve a job flow definition specifying the tasks of the job flow from the federated area; determine whether there is an instance log in the federated area generated by a previous performance of the job flow with the data set; in response to there being such an instance log, retrieve the version specified in the instance log of each task routine for each task from the federated area; in response to there being no such instance log, retrieve the most recent version of each task routine; perform the job flow with the retrieved versions of the task routines and the data set to generate a result report; and provide the result report to the remote device.
58 Citations
29 Claims
-
1. An apparatus comprising a processor and a storage to store instructions that, when executed by the processor, cause the processor to perform operations comprising:
-
maintain, within one or more storage devices, a federated area to store multiple data sets, multiple job flow definitions, multiple task routines, multiple result reports and multiple instance logs; provide, on a network, a portal to control access by a remote device to the federated area via the network; receive, at the portal, and from the remote device via the network, a first request to execute at least one task routine stored in the federated area to perform at least one corresponding task of a job flow specified in a job flow definition stored in the federated area with at least one data set stored in the federated area, wherein the first request specifies the job flow definition and the at least one data set; retrieve the job flow definition from among the multiple job flow definitions stored in the federated area; retrieve the at least one data set from among the multiple data sets stored in the federated area; determine whether there are one or more instance logs among the multiple instance logs stored in the federated area that were each generated by a previous performance of the at least one task of the job flow with the at least one data set; in response to a determination that there is just one instance log among the multiple instance logs stored in the federated area that was generated by a previous performance of the at least one task of the job flow with the at least one data set, the processor is caused to perform operations comprising; retrieve, from among the multiple task routines stored in the federated area, a version specified by the one instance log of each task routine of the at least one task routine; execute the retrieved version of each task routine of the at least one task routine to perform the at least one corresponding task of the job flow with the at least one data set to generate a new result report and a new instance log; store the new result report among the multiple result reports in the federated area; store the new instance log among the multiple instance logs in the federated area; and provide access to the new result report to the remote device via the network; in response to a determination that there is no instance log among the multiple instance logs stored in the federated area that was generated by a previous performance of the at least one task of the job flow with the at least one data set, the processor is caused to perform operations comprising; retrieve, from among the multiple task routines stored in the federated area, a most recent version of each task routine of the at least one task routine; execute the most recent version of each task routine of the at least one task routine to perform the at least one corresponding task of the job flow with the at least one data set to generate a new result report and a new instance log; store the new result report among the multiple result reports in the federated area; store the new instance log among the multiple instance logs in the federated area; and provide access to the new result report to the remote device via the network; and in response to a determination that there is more than one instance log among the multiple instance logs that were each generated by a previous performance of the at least one task of the job flow with the at least one data set, the processor is caused to provide, via the network, an indication of the more than one instance log to the remote device. - View Dependent Claims (2, 3, 4, 5, 6, 7, 8, 9, 10)
-
-
11. A computer-program product tangibly embodied in a non-transitory machine-readable storage medium, the computer-program product including instructions operable to cause a processor to perform operations comprising:
-
maintain, within one or more storage devices, a federated area to store multiple data sets, multiple job flow definitions, multiple task routines, multiple result reports and multiple instance logs; provide, on a network, a portal to control access by a remote device to the federated area via the network; receive, at the portal, and from the remote device via the network, a first request to execute at least one task routine stored in the federated area to perform at least one corresponding task of a job flow specified in a job flow definition stored in the federated area with at least one data set stored in the federated area, wherein the first request specifies the job flow definition and the at least one data set; retrieve the job flow definition from among the multiple job flow definitions stored in the federated area; retrieve the at least one data set from among the multiple data sets stored in the federated area; determine whether there are one or more instance logs among the multiple instance logs stored in the federated area that were each generated by a previous performance of the at least one task of the job flow with the at least one data set; in response to a determination that there is just one instance log among the multiple instance logs stored in the federated area that was generated by a previous performance of the at least one task of the job flow with the at least one data set, the processor is caused to perform operations comprising; retrieve, from among the multiple task routines stored in the federated area, a version specified by the one instance log of each task routine of the at least one task routine; execute the retrieved version of each task routine of the at least one task routine to perform the at least one corresponding task of the job flow with the at least one data set to generate a new result report and a new instance log; store the new result report among the multiple result reports in the federated area; store the new instance log among the multiple instance logs in the federated area; and provide access to the new result report to the remote device via the network; in response to a determination that there is no instance log among the multiple instance logs stored in the federated area that was generated by a previous performance of the at least one task of the job flow with the at least one data set, the processor is caused to perform operations comprising; retrieve, from among the multiple task routines stored in the federated area, a most recent version of each task routine of the at least one task routine; execute the most recent version of each task routine of the at least one task routine to perform the at least one corresponding task of the job flow with the at least one data set to generate a new result report and a new instance log; store the new result report among the multiple result reports in the federated area; store the new instance log among the multiple instance logs in the federated area; and provide access to the new result report to the remote device via the network; and in response to a determination that there is more than one instance log among the multiple instance logs that were each generated by a previous performance of the at least one task of the job flow with the at least one data set, the processor is caused to provide, via the network, an indication of the more than one instance log to the remote device. - View Dependent Claims (12, 13, 14, 15, 16, 17, 18, 19, 20)
-
-
21. A computer-implemented method comprising:
-
maintaining, at a server by a processor, and within one or more storage devices, a federated area to store multiple data sets, multiple job flow definitions, multiple task routines, multiple result reports and multiple instance logs; providing, at the server by the processor, a portal to control access by a remote device to the federated area through a network coupled to the server; receiving, at the portal at the server, and from the remote device via the network, a first request to execute at least one task routine stored in the federated area to perform at least one corresponding task of a job flow specified in a job flow definition stored in the federated area with at least one data set stored in the federated area, wherein the first request specifies the job flow definition and the at least one data set; retrieving, by the processor, the job flow definition from among the multiple job flow definitions stored in the federated area; retrieving, by the processor, the at least one data set from among the multiple data sets stored in the federated area; determining, by the processor, whether there are one or more instance logs among the multiple instance logs stored in the federated area that were each generated by a previous performance of the at least one task of the job flow with the at least one data set; in response to a determination by the processor that there is just one instance log among the multiple instance logs stored in the federated area that was generated by a previous performance of the at least one task of the job flow with the at least one data set, performing operations comprising; retrieving, by the processor and from among the multiple task routines stored in the federated area, a version specified by the one instance log of each task routine of the at least one task routine; executing the retrieved version of each task routine of the at least one task routine to perform the at least one corresponding task of the job flow with the at least one data set to generate a new result report and a new instance log; storing, by the processor, the new result report among the multiple result reports in the federated area; storing, by the processor, the new instance log among the multiple instance logs in the federated area; and providing access to the new result report to the remote device via the network; in response to a determination by the processor that there is no instance log among the multiple instance logs stored in the federated area that was generated by a previous performance of the at least one task of the job flow with the at least one data set, performing operations comprising; retrieving by the processor and from among the multiple task routines stored in the federated area, a most recent version of each task routine of the at least one task routine; executing the most recent version of each task routine of the at least one task routine to perform the at least one corresponding task of the job flow with the at least one data set to generate a new result report and a new instance log; storing, by the processor, the new result report among the multiple result reports in the federated area; storing, by the processor, the new instance log among the multiple instance logs in the federated area; and providing access to the new result report to the remote device via the network; and in response to a determination by the processor that there is more than one instance log among the multiple instance logs that were each generated by a previous performance of the at least one task of the job flow with the at least one data set, providing, via the network, an indication of the more than one instance log to the remote device. - View Dependent Claims (22, 23, 24, 25, 26, 27, 28, 29)
-
Specification