×

Distributed data set storage, retrieval and analysis

  • US 9,684,543 B1
  • Filed: 02/06/2017
  • Issued: 06/20/2017
  • Est. Priority Date: 02/05/2016
  • Status: Active Grant
First Claim
Patent Images

1. An apparatus comprising a processor and a storage to store instructions that, when executed by the processor, cause the processor to perform operations comprising:

  • maintain, within one or more storage devices, a federated area to store multiple data sets, multiple job flow definitions, multiple task routines, multiple result reports and multiple instance logs;

    provide, on a network, a portal to control access by a remote device to the federated area via the network;

    receive, at the portal, and from the remote device via the network, a first request to execute at least one task routine stored in the federated area to perform at least one corresponding task of a job flow described in a job flow definition stored in the federated area with at least one data set stored in the federated area, wherein the first request specifies the job flow definition and the at least one data set;

    retrieve the job flow definition from among the multiple job flow definitions stored in the federated area, wherein the job flow definition comprises a flow task identifier to identify each task of the job flow and specifies a relative order in which each task is to be performed in the job flow;

    for each task of the job flow, retrieve, from among the multiple task routines stored in the federated area, a most recent version of the corresponding task routine of the at least one task routine;

    determine whether there is an instance log among the multiple instance logs stored in the federated area that was generated by a previous performance of the at least one task of the job flow with the at least one data set; and

    in response to a determination that there is an instance log among the multiple instance logs stored in the federated area that was generated by a previous performance of the at least one task of the job flow with the at least one data set, perform operations comprising;

    retrieve, from among the multiple task routines stored in the federated area, a version specified by the instance log of each task routine of the at least one task routine;

    for each task of the at least one task of the job flow, compare the version specified by the instance log of each task routine of the at least one task routine to the most recent version of each task routine of the at least one task routine;

    in response to each version specified by the instance log of each task routine of the at least one task routine matching the most recent version of the same task routine, perform operations comprising;

    retrieve a result report that was generated by the previous performance of the at least one task of the job flow along with the instance log; and

    provide access to the result report to the remote device via the network; and

    in response to a determination that there is no instance log among the multiple instance logs stored in the federated area that was generated by a previous performance of the at least one task of the job flow with the at least one data set, perform operations comprising;

    retrieve the at least one data set from among the multiple data sets stored in the federated area;

    execute the most recent version of each task routine of the at least one task routine to perform the at least one corresponding task of the job flow with the at least one data set to generate a new result report and a new instance log;

    store the new result report among the multiple result reports in the federated area;

    store the new instance log among the multiple instance logs in the federated area; and

    provide access to the new result report to the remote device via the network.

View all claims
  • 1 Assignment
Timeline View
Assignment View
    ×
    ×