×

Distributed data set storage and analysis reproducibility

  • US 9,852,013 B2
  • Filed: 06/05/2017
  • Issued: 12/26/2017
  • Est. Priority Date: 02/05/2016
  • Status: Active Grant
First Claim
Patent Images

1. An apparatus comprising a processor and a storage to store instructions that, when executed by the processor, cause the processor to perform operations comprising:

  • receive, at a portal, and from a remote device via a the network, a first request to execute at least one task routine specified by the first request as stored in a federated area to perform at least one corresponding task of a job flow specified in a job flow definition stored in the federated area with at least one data set specified by the first request as stored in the federated area, wherein;

    the portal is provided on the network to control access to the federated area by the remote device via the network; and

    the federated area is maintained within one or more storage devices to store multiple data sets, multiple job flow definitions, multiple task routines, multiple result reports and multiple instance logs;

    retrieve the job flow definition from among the multiple job flow definitions stored in the federated area;

    retrieve the at least one data set from among the multiple data sets stored in the federated area;

    determine whether there is at least one instance log among the multiple instance logs stored in the federated area that was generated by a previous performance of the at least one task of the job flow with the at least one data set;

    in response to a determination that there is just a single instance log among the multiple instance logs stored in the federated area that was generated by a previous performance of the at least one task of the job flow with the at least one data set, the processor is caused to perform operations comprising;

    retrieve a version specified by the single instance log of each task routine of the at least one task routine from among the multiple task routines stored in the federated area;

    execute the retrieved version of each task routine of the at least one task routine to perform the at least one corresponding task of the job flow with the at least one data set to generate a new result report and a new instance log;

    store the new result report among the multiple result reports in the federated area;

    store the new instance log among the multiple instance logs in the federated area; and

    provide access to the new result report to the remote device via the portal; and

    in response to a determination that there is more than one instance log among the multiple instance logs that were each generated by a previous performance of the at least one task of the job flow with the at least one data set, the processor is caused to perform operations comprising;

    select the most recently generated one of the more than one instance log to be the single instance log;

    retrieve the version specified by the single instance log of each task routine of the at least one task routine from among the multiple task routines stored in the federated area;

    execute the retrieved version of each task routine of the at least one task routine to perform the at least one corresponding task of the job flow with the at least one data set to generate a new result report and a new instance log;

    store the new result report among the multiple result reports in the federated area;

    store the new instance log among the multiple instance logs in the federated area; and

    provide access to the new result report to the remote device via the portal.

View all claims
  • 0 Assignments
Timeline View
Assignment View
    ×
    ×