Systems and methods for scheduling data flow execution based on an arbitrary graph describing the desired data flow
First Claim
1. A data transformation service comprising:
- a data retrieval system to receive data from a source;
a data transformation pipeline comprising;
a plurality of component objects;
a graphical user interface by which a user can represent a data transformation as a series of interconnected nodes in a graph, each node corresponding to a component object from among the plurality of components objects;
an interpreter that traverses the graph and translates the graph into a data flow execution plan and at least one work list, said list comprising at least one work item;
a pipeline engine to build the data flow execution based on the data flow execution plan, said data flow execution comprising a set of components instantiated from the plurality of component objects; and
a scheduler that executes at least one work item in at least one work list;
a destination data storage system to store data.
2 Assignments
0 Petitions
Accused Products
Abstract
The data transformation system (DTS) in one embodiment of the present invention comprises a capability to receive data from a data source, a data destination and a capability to store transformed data therein, and a data transformation pipeline (DTP) that constructs complex end-to-end data transformation functionality (data flow executions or DFEs) by pipelining data flowing from one or more sources to one or more destinations through various interconnected nodes (that, when instantiated, become components in the pipeline) for transforming the data as it flows by (where the term transforming is used herein to broadly describe the universe of interactions that can be conducted to, with, by, or on data). Each component in the pipeline possesses specific predefined data transformation functionality, and the logical connections between components define the data flow pathway in an operational sense.
The data transformation pipeline (DTP) enables a user to develop complex end-to-end data transformation functionality (the DFEs) by graphically describing and representing, via a graphical user interface (GUI), a desired data flow from one or more sources to one or more destinations through various interconnected nodes (a graph). Each node in the graph selected by the user and incorporated in the graph represents specific predefined data transformation functionality (each a component), and connections between the nodes (the components) define the data flow pathway.
-
Citations
41 Claims
-
1. A data transformation service comprising:
-
a data retrieval system to receive data from a source;
a data transformation pipeline comprising;
a plurality of component objects;
a graphical user interface by which a user can represent a data transformation as a series of interconnected nodes in a graph, each node corresponding to a component object from among the plurality of components objects;
an interpreter that traverses the graph and translates the graph into a data flow execution plan and at least one work list, said list comprising at least one work item;
a pipeline engine to build the data flow execution based on the data flow execution plan, said data flow execution comprising a set of components instantiated from the plurality of component objects; and
a scheduler that executes at least one work item in at least one work list;
a destination data storage system to store data. - View Dependent Claims (2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13)
-
-
14. A data transformation pipeline comprising:
-
a plurality of component objects;
a graphical user interface by which a user can represent a data transformation as a series of interconnected nodes in a graph, each node corresponding to a component object from among the plurality of components objects;
an interpreter that traverses the graph and translates the graph into a data flow execution plan and at least one work list, said list comprising at least one work item;
a pipeline engine to build the data flow execution based on the data flow execution plan, said data flow execution comprising a set of components instantiated from the plurality of component objects; and
a scheduler that executes at least one work item in at least one work list. - View Dependent Claims (15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 26)
-
-
27. A method for transforming data comprising:
-
extracting data from at least one external data source;
storing the data in a buffer and establishing a set of primary pointers to the data;
passing the set of primary pointers to the data in the buffer to a first component in order for the first component to transform the data directly in the buffer; and
loading the data from the buffer to at least one external data destination. - View Dependent Claims (28, 29, 30, 31, 32, 33)
-
-
34. A method for transforming data comprising:
-
extracting data from a source, said data comprising n rows wherein n is any positive integer number greater than zero;
writing the data to a buffer;
creating n pointers wherein each pointer uniquely points to a single row of data from among the n rows of data in the buffer;
passing the n pointers to a next transformation object in a path, the next transformation object being the first transformation object on the first pass, the second transformation object on the second pass, and so forth;
enabling the transformation object to transform the data in the buffer, said transformation object directly accessing the data in the buffer via the pointers;
returning to the element of passing the n pointers to the next transformation and proceeding from there if there remains any transformations unexecuted in the path;
reading the data from the buffer; and
loading the data to a destination.
-
-
35. A computer-readable medium bearing computer-readable instructions for:
-
extracting data from at least one external data source;
storing the data in a buffer and establishing a set of primary pointers to the data;
passing the set of primary pointers to the data in the buffer to a first component in order for the first component to transform the data directly in the buffer; and
loading the data from the buffer to at least one external data destination. - View Dependent Claims (36, 37, 38, 39, 40, 41)
-
Specification