×

Generation of job flow objects in federated areas from data structure

  • US 10,394,890 B2
  • Filed: 12/20/2018
  • Issued: 08/27/2019
  • Est. Priority Date: 02/05/2016
  • Status: Active Grant
First Claim
Patent Images

1. An apparatus comprising a processor and a storage to store instructions that, when executed by the processor, cause the processor to perform operations comprising:

  • receive, by the processor and from an input device, a request to generate a visualization of a directed acyclic graph (DAG) of a job flow of multiple tasks of an analysis that is based on data and multiple formulae incorporated into a spreadsheet data structure specified in the request, wherein;

    the data incorporated into the spreadsheet data structure is organized into at least one data table within the spreadsheet data structure;

    each data table is divisible into multiple data subparts that each comprise at least one row or at least one column of the data table;

    the multiple formulae are organized into at least one formula table within the spreadsheet data structure; and

    each formula of the multiple formulae specifies a task of the multiple tasks, and incorporates at least one indication of data required as input to perform the task and at least one indication of data that is generated as output when the task is performed;

    correlate each indication of data required as input to a subpart of a data table of the at least one data table;

    correlate each indication of data generated as output to a subpart of a data table of the at least one data table;

    among the multiple formulae, correlate the indications of data required as input to the indications of data generated as output to identify data dependencies among the multiple tasks;

    identify at least one pair of tasks of the multiple tasks that are able to be performed in parallel due to a lack of data dependencies therebetween;

    determine an order of performance of the multiple tasks based on the identification of the data dependencies and the at least one pair of tasks;

    generate, within a federated area specified in the request, a job flow definition that specifies the order of performance of the multiple tasks, wherein;

    each task of the multiple tasks is specified with a unique flow task identifier of multiple flow task identifiers; and

    the job flow definition includes an indication of each identified pair of tasks able to be performed in parallel;

    for each task of the multiple tasks, generate, within the specified federated area, a corresponding macro data structure of multiple macro data structures, wherein each macro data structure comprises;

    the flow task identifier of the task;

    indications of characteristics of at least one input interface for each input that is required to perform the task; and

    indications of characteristics of at least one output interface for each output that is generated when the task is performed; and

    for each indication of data required as an input to a task of the multiple tasks, the processor is caused to;

    in response to an indication of data required as an input to a task that is not able to be correlated to a subpart of a data table, augment the macro data structure that corresponds to the task to include an indication of missing data required as an input to the task to enable the inclusion of a corresponding visual error indication in a visual representation of the task;

    orin response to a data type mismatch between a data type of specified in an indication of data required as an input to a task and a data type of the subpart of the data table of the at least one data table that is correlated to the indication, augment the macro data structure that corresponds to the task to include an indication of a data type mismatch error to enable the inclusion of a corresponding visual error indication in a visual representation of the task;

    generate the requested visualization based on the job flow definition and the multiple macro data structures, wherein;

    the visualization includes, for each macro data structure of the multiple macro data structures, a visual representation of the corresponding task of the multiple tasks; and

    each representation of a task comprises;

    a task graph object;

    at least one input data graph object that represents an input, that has a connection to the task graph object in the representation, and that comprises a visual indication of the at least one characteristic of the input; and

    at least one output data graph object that represents an output, that has a connection to the task graph object in the representation, and that comprises an indication of the at least one characteristic of the output.

View all claims
  • 1 Assignment
Timeline View
Assignment View
    ×
    ×