×

Distributed data set storage and analysis reproducibility

  • US 10,078,710 B2
  • Filed: 12/22/2017
  • Issued: 09/18/2018
  • Est. Priority Date: 02/05/2016
  • Status: Active Grant
First Claim
Patent Images

1. An apparatus comprising a processor and a storage to store instructions that, when executed by the processor, cause the processor to perform operations comprising:

  • receive, at a portal, and from a remote device via a the network, a first request to execute at least one task routine stored in a federated area to perform at least one corresponding task of a job flow described in a job flow definition stored in the federated area with at least one data set stored in the federated area, wherein;

    the portal is provided on the network to control access by the remote device to the federated area via the network;

    the federated area is maintained within one or more storage devices to store multiple data sets, multiple job flow definitions, multiple task routines, multiple result reports and multiple instance logs; and

    the first request specifies a job flow identifier of the job flow definition and at least one data object identifier of the at least one data set;

    use the job flow identifier to retrieve the job flow definition from among the multiple job flow definitions stored in the federated area, wherein the job flow definition comprises a flow task identifier for each task of the at least one task of the job flow and specifies a relative order in which each task of the at least one task is to be performed in the job flow;

    use the at least one data object identifier to retrieve the at least one data set from among the multiple data sets stored in the federated area;

    use the job flow identifier and the at least one data object identifier as portions of a first index to search for any instance log among the multiple instance logs stored in the federated area to determine whether there is at least one instance log among the multiple instance logs that was generated by a previous performance of the at least one task of the job flow with the at least one data set;

    in response to a determination that there is at least one instance log among the multiple instance logs stored in the federated area that was generated by a previous performance of the at least one task of the job flow with the at least one data set, perform operations comprising;

    use the first index to retrieve an instance log of the at least one instance log from among the multiple instance logs stored in the federated area, wherein for each task of the at least one task of the job flow, the retrieved instance log comprises a task routine identifier that specifies a version of task routine that was executed during the corresponding previous performance to perform the task; and

    for each task of the at least one task of the job flow, use the corresponding task routine identifier of the retrieved instance log to retrieve, from among the multiple task routines stored in the federated area, the version of task routine that was executed during the corresponding previous performance;

    in response to a determination that there is no instance log among the multiple instance logs stored in the federated area that was generated by a previous performance of the at least one task of the job flow with the at least one data set, perform operations comprising;

    for each task of the at least one task of the job flow, match the corresponding flow task identifier in the job flow definition to a task routine identifier of a most recent version of task routine executable to perform the task that is stored among the multiple task routines stored in the federated area; and

    for each task of the at least one task of the job flow, use the task routine identifier of the most recent version of task routine executable to perform the task to retrieve the most recent version of task routine from among the multiple task routines stored in the federated area;

    for each task of the at least one task of the job flow, execute the retrieved version of task routine executable to perform the task to generate a new result report and a new instance log;

    store the new result report among the multiple result reports in the federated area;

    store the new instance log among the multiple instance logs in the federated area; and

    provide access to the new result report to the remote device via the portal.

View all claims
  • 0 Assignments
Timeline View
Assignment View
    ×
    ×