Generation of job flow objects in federated areas from data structure
First Claim
1. An apparatus comprising a processor and a storage to store instructions that, when executed by the processor, cause the processor to perform operations comprising:
- receive, by the processor and from an input device, a request to generate a visualization of a directed acyclic graph (DAG) of a job flow of multiple tasks of an analysis that is based on data and multiple formulae incorporated into a spreadsheet data structure specified in the request, wherein;
the data incorporated into the spreadsheet data structure is organized into at least one data table within the spreadsheet data structure;
each data table is divisible into multiple data subparts that each comprise at least one row or at least one column of the data table;
the multiple formulae are organized into at least one formula table within the spreadsheet data structure; and
each formula of the multiple formulae specifies a task of the multiple tasks, and incorporates at least one indication of data required as input to perform the task and at least one indication of data that is generated as output when the task is performed;
correlate each indication of data required as input to a subpart of a data table of the at least one data table;
correlate each indication of data generated as output to a subpart of a data table of the at least one data table;
among the multiple formulae, correlate the indications of data required as input to the indications of data generated as output to identify data dependencies among the multiple tasks;
identify at least one pair of tasks of the multiple tasks that are able to be performed in parallel due to a lack of data dependencies therebetween;
determine an order of performance of the multiple tasks based on the identification of the data dependencies and the at least one pair of tasks;
generate, within a federated area specified in the request, a job flow definition that specifies the order of performance of the multiple tasks, wherein;
each task of the multiple tasks is specified with a unique flow task identifier of multiple flow task identifiers; and
the job flow definition includes an indication of each identified pair of tasks able to be performed in parallel;
for each task of the multiple tasks, generate, within the specified federated area, a corresponding macro data structure of multiple macro data structures, wherein each macro data structure comprises;
the flow task identifier of the task;
indications of characteristics of at least one input interface for each input that is required to perform the task; and
indications of characteristics of at least one output interface for each output that is generated when the task is performed; and
for each indication of data required as an input to a task of the multiple tasks, the processor is caused to;
in response to an indication of data required as an input to a task that is not able to be correlated to a subpart of a data table, augment the macro data structure that corresponds to the task to include an indication of missing data required as an input to the task to enable the inclusion of a corresponding visual error indication in a visual representation of the task;
orin response to a data type mismatch between a data type of specified in an indication of data required as an input to a task and a data type of the subpart of the data table of the at least one data table that is correlated to the indication, augment the macro data structure that corresponds to the task to include an indication of a data type mismatch error to enable the inclusion of a corresponding visual error indication in a visual representation of the task;
generate the requested visualization based on the job flow definition and the multiple macro data structures, wherein;
the visualization includes, for each macro data structure of the multiple macro data structures, a visual representation of the corresponding task of the multiple tasks; and
each representation of a task comprises;
a task graph object;
at least one input data graph object that represents an input, that has a connection to the task graph object in the representation, and that comprises a visual indication of the at least one characteristic of the input; and
at least one output data graph object that represents an output, that has a connection to the task graph object in the representation, and that comprises an indication of the at least one characteristic of the output.
1 Assignment
0 Petitions
Accused Products
Abstract
An apparatus includes a processor to: receive a request to generate a DAG of a job flow of multiple tasks of an analysis based on data table(s) and formulae of a spreadsheet data structure; correlate each indication of data required as input or output to at least a subpart of a data table; identify data dependencies and determine an order of performance among the multiple tasks based on the formulae; generate, within the specified federated area, a job flow definition that specifies the order of performance of the multiple tasks; for each task of the multiple tasks, generate, within the specified federated area, a corresponding macro data structure of multiple macro data structures; and generate the requested visualization based on the job flow definition and the multiple macro data structures.
44 Citations
27 Claims
-
1. An apparatus comprising a processor and a storage to store instructions that, when executed by the processor, cause the processor to perform operations comprising:
-
receive, by the processor and from an input device, a request to generate a visualization of a directed acyclic graph (DAG) of a job flow of multiple tasks of an analysis that is based on data and multiple formulae incorporated into a spreadsheet data structure specified in the request, wherein; the data incorporated into the spreadsheet data structure is organized into at least one data table within the spreadsheet data structure; each data table is divisible into multiple data subparts that each comprise at least one row or at least one column of the data table; the multiple formulae are organized into at least one formula table within the spreadsheet data structure; and each formula of the multiple formulae specifies a task of the multiple tasks, and incorporates at least one indication of data required as input to perform the task and at least one indication of data that is generated as output when the task is performed; correlate each indication of data required as input to a subpart of a data table of the at least one data table; correlate each indication of data generated as output to a subpart of a data table of the at least one data table; among the multiple formulae, correlate the indications of data required as input to the indications of data generated as output to identify data dependencies among the multiple tasks; identify at least one pair of tasks of the multiple tasks that are able to be performed in parallel due to a lack of data dependencies therebetween; determine an order of performance of the multiple tasks based on the identification of the data dependencies and the at least one pair of tasks; generate, within a federated area specified in the request, a job flow definition that specifies the order of performance of the multiple tasks, wherein; each task of the multiple tasks is specified with a unique flow task identifier of multiple flow task identifiers; and the job flow definition includes an indication of each identified pair of tasks able to be performed in parallel; for each task of the multiple tasks, generate, within the specified federated area, a corresponding macro data structure of multiple macro data structures, wherein each macro data structure comprises; the flow task identifier of the task; indications of characteristics of at least one input interface for each input that is required to perform the task; and indications of characteristics of at least one output interface for each output that is generated when the task is performed; and for each indication of data required as an input to a task of the multiple tasks, the processor is caused to; in response to an indication of data required as an input to a task that is not able to be correlated to a subpart of a data table, augment the macro data structure that corresponds to the task to include an indication of missing data required as an input to the task to enable the inclusion of a corresponding visual error indication in a visual representation of the task;
orin response to a data type mismatch between a data type of specified in an indication of data required as an input to a task and a data type of the subpart of the data table of the at least one data table that is correlated to the indication, augment the macro data structure that corresponds to the task to include an indication of a data type mismatch error to enable the inclusion of a corresponding visual error indication in a visual representation of the task; generate the requested visualization based on the job flow definition and the multiple macro data structures, wherein; the visualization includes, for each macro data structure of the multiple macro data structures, a visual representation of the corresponding task of the multiple tasks; and each representation of a task comprises; a task graph object; at least one input data graph object that represents an input, that has a connection to the task graph object in the representation, and that comprises a visual indication of the at least one characteristic of the input; and at least one output data graph object that represents an output, that has a connection to the task graph object in the representation, and that comprises an indication of the at least one characteristic of the output. - View Dependent Claims (2, 3, 4, 5, 6, 7, 8, 9)
-
-
10. A computer-program product tangibly embodied in a non-transitory machine-readable storage medium, the computer-program product including instructions operable to cause a processor to perform operations comprising:
-
receive, by the processor and from an input device, a request to generate a visualization of a directed acyclic graph (DAG) of a job flow of multiple tasks of an analysis that is based on data and multiple formulae incorporated into a spreadsheet data structure specified in the request, wherein; the data incorporated into the spreadsheet data structure is organized into at least one data table within the spreadsheet data structure; each data table is divisible into multiple data subparts that each comprise at least one row or at least one column of the data table; the multiple formulae are organized into at least one formula table within the spreadsheet data structure; and each formula of the multiple formulae specifies a task of the multiple tasks, and incorporates at least one indication of data required as input to perform the task and at least one indication of data that is generated as output when the task is performed; correlate each indication of data required as input to a subpart of a data table of the at least one data table; correlate each indication of data generated as output to a subpart of a data table of the at least one data table; among the multiple formulae, correlate the indications of data required as input to the indications of data generated as output to identify data dependencies among the multiple tasks; identify at least one pair of tasks of the multiple tasks that are able to be performed in parallel due to a lack of data dependencies therebetween; determine an order of performance of the multiple tasks based on the identification of the data dependencies and the at least one pair of tasks; generate, within a federated area specified in the request, a job flow definition that specifies the order of performance of the multiple tasks, wherein; each task of the multiple tasks is specified with a unique flow task identifier of multiple flow task identifiers; and the job flow definition includes an indication of each identified pair of tasks able to be performed in parallel; for each task of the multiple tasks, generate, within the specified federated area, a corresponding macro data structure of multiple macro data structures, wherein each macro data structure comprises; the flow task identifier of the task; indications of characteristics of at least one input interface for each input that is required to perform the task; and indications of characteristics of at least one output interface for each output that is generated when the task is performed; and for each indication of data required as an input to a task of the multiple tasks, the processor is caused to; in response to an indication of data required as an input to a task that is not able to be correlated to a subpart of a data table, augment the macro data structure that corresponds to the task to include an indication of missing data required as an input to the task to enable the inclusion of a corresponding visual error indication in a visual representation of the task;
orin response to a data type mismatch between a data type of specified in an indication of data required as an input to a task and a data type of the subpart of the data table of the at least one data table that is correlated to the indication, augment the macro data structure that corresponds to the task to include an indication of a data type mismatch error to enable the inclusion of a corresponding visual error indication in a visual representation of the task; generate the requested visualization based on the job flow definition and the multiple macro data structures, wherein; the visualization includes, for each macro data structure of the multiple macro data structures, a visual representation of the corresponding task of the multiple tasks; and each representation of a task comprises; a task graph object; at least one input data graph object that represents an input, that has a connection to the task graph object in the representation, and that comprises a visual indication of the at least one characteristic of the input; and at least one output data graph object that represents an output, that has a connection to the task graph object in the representation, and that comprises an indication of the at least one characteristic of the output. - View Dependent Claims (11, 12, 13, 14, 15, 16, 17, 18)
-
-
19. A computer-implemented method comprising:
-
receiving, by a processor and from a requesting device, a request to generate a visualization of a directed acyclic graph (DAG) of a job flow of multiple tasks of an analysis that is based on data and multiple formulae incorporated into a spreadsheet data structure specified in the request, wherein; the data incorporated into the spreadsheet data structure is organized into at least one data table within the spreadsheet data structure; each data table is divisible into multiple data subparts that each comprise at least one row or at least one column of the data table; the multiple formulae are organized into at least one formula table within the spreadsheet data structure; and each formula of the multiple formulae specifies a task of the multiple tasks, and incorporates at least one indication of data required as input to perform the task and at least one indication of data that is generated as output when the task is performed; correlating, by the processor, each indication of data required as input to a subpart of a data table of the at least one data table; correlating, by the processor, each indication of data generated as output to a subpart of a data table of the at least one data table; among the multiple formulae, correlating, by the processor, the indications of data required as input to the indications of data generated as output to identify data dependencies among the multiple tasks; identifying, by the processor, at least one pair of tasks of the multiple tasks that are able to be performed in parallel due to a lack of data dependencies therebetween; determining, by the processor, an order of performance of the multiple tasks based on the identification of the data dependencies and the at least one pair of tasks; generating, by the processor and within a federated area specified in the request, a job flow definition that specifies the order of performance of the multiple tasks, wherein; each task of the multiple tasks is specified with a unique flow task identifier of multiple flow task identifiers; and the job flow definition includes an indication of each identified pair of tasks able to be performed in parallel; for each task of the multiple tasks, generating, by the processor and within the specified federated area, a corresponding macro data structure of multiple macro data structures, wherein each macro data structure comprises; the flow task identifier of the task; indications of characteristics of at least one input interface for each input that is required to perform the task; and indications of characteristics of at least one output interface for each output that is generated when the task is performed; and for each indication of data required as an input to a task of the multiple tasks, performing operations comprising; in response to an indication of data required as an input to a task that is not able to be correlated to a subpart of a data table, augmenting the macro data structure that corresponds to the task to include an indication of missing data required as an input to the task to enable the inclusion of a corresponding visual error indication in a visual representation of the task;
orin response to a data type mismatch between a data type of specified in an indication of data required as an input to a task and a data type of the subpart of the data table of the at least one data table that is correlated to the indication, augmenting the macro data structure that corresponds to the task to include an indication of a data type mismatch error to enable the inclusion of a corresponding visual error indication in a visual representation of the task; generating, by the processor, the requested visualization based on the job flow definition and the multiple macro data structures, wherein; the visualization includes, for each macro data structure of the multiple macro data structures, a visual representation of the corresponding task of the multiple tasks; and each representation of a task comprises; a task graph object; at least one input data graph object that represents an input, that has a connection to the task graph object in the representation, and that comprises a visual indication of the at least one characteristic of the input; and at least one output data graph object that represents an output, that has a connection to the task graph object in the representation, and that comprises an indication of the at least one characteristic of the output. - View Dependent Claims (20, 21, 22, 23, 24, 25, 26, 27)
-
Specification