System for performing data transformations using a set of independent software components
First Claim
1. A method for transforming data from a source data store for loading onto a target data store, the method comprising:
- obtaining a transformation graph of a set of components, each component executable as an independent service and performing a predefined operation, the transformation graph representing one or more sequences of operations to transform data from one or more source data stores to a format for updating the target data store, wherein the data from the one or more source data stores corresponds to one or more accounts during a current day and a previous day;
initiating an execution of the set of components based on the transformation graph, each component executing as a separate process, wherein the execution of the set of components includes executing;
a first source component that performs operations including;
reading the data associated with the current day from a first of the one or more source data stores; and
converting the data associated with the current day to a first table data structure;
a second source component that performs operations including;
reading data from a second of the one or more source data stores; and
converting the data to a second table data structure; and
a third source component that performs operations including;
reading the data associated with the previous day from a third of the one or more source data stores; and
converting the data associated with the previous day to a third table data structure;
one or more transformation components that perform operations including;
transforming the data of the first table data structure as part of a first sequence of transformation operations;
transforming the data of the second table data structure as part of a second sequence of transformation operations; and
transforming the data of the third data table structure as part of a third sequence of transformation operations, wherein the third sequence of transformation operations comprises;
identifying updated data based upon a comparison of the data associated with the current day and the data associated with the previous day;
combining the updated data with data corresponding to one or more new accounts to produce combined data; and
joining the combined data with a copy of the data of the third table data structure to produce joined data; and
one or more target components that perform operations including;
writing one or more output files for updating the target data store, wherein the one or more output files comprise;
a first output file, based at least partially upon the updated data, that includes an updated list of the one or more accounts that is synchronized with changes to the one or more accounts within or between the previous day and the current day; and
a second output file, based at least partially upon the joined data, that includes the changes that occur within or between the previous day and the current day to the one or more accounts; and
initiating a loading of the one or more output files to the target data store.
1 Assignment
0 Petitions
Accused Products
Abstract
Described is a system (and method) that provides a framework for performing data transformations, which may be part of an Extract, Transform, and Load (ETL) process. The system may perform a data transformation by creating a pipeline that executes a set of independent software components (or component, plugins, add-ons, etc.). The components may be executed as individual services (e.g., microservices) that may be provided within containers to allow the components to be deployed as self-contained units on various types of host systems including cloud-based infrastructures. In addition, to provide further flexibility for the framework, the components may be implemented using preexisting software libraries.
13 Citations
17 Claims
-
1. A method for transforming data from a source data store for loading onto a target data store, the method comprising:
-
obtaining a transformation graph of a set of components, each component executable as an independent service and performing a predefined operation, the transformation graph representing one or more sequences of operations to transform data from one or more source data stores to a format for updating the target data store, wherein the data from the one or more source data stores corresponds to one or more accounts during a current day and a previous day; initiating an execution of the set of components based on the transformation graph, each component executing as a separate process, wherein the execution of the set of components includes executing; a first source component that performs operations including; reading the data associated with the current day from a first of the one or more source data stores; and converting the data associated with the current day to a first table data structure; a second source component that performs operations including; reading data from a second of the one or more source data stores; and converting the data to a second table data structure; and a third source component that performs operations including; reading the data associated with the previous day from a third of the one or more source data stores; and converting the data associated with the previous day to a third table data structure; one or more transformation components that perform operations including; transforming the data of the first table data structure as part of a first sequence of transformation operations; transforming the data of the second table data structure as part of a second sequence of transformation operations; and transforming the data of the third data table structure as part of a third sequence of transformation operations, wherein the third sequence of transformation operations comprises; identifying updated data based upon a comparison of the data associated with the current day and the data associated with the previous day; combining the updated data with data corresponding to one or more new accounts to produce combined data; and joining the combined data with a copy of the data of the third table data structure to produce joined data; and one or more target components that perform operations including; writing one or more output files for updating the target data store, wherein the one or more output files comprise; a first output file, based at least partially upon the updated data, that includes an updated list of the one or more accounts that is synchronized with changes to the one or more accounts within or between the previous day and the current day; and a second output file, based at least partially upon the joined data, that includes the changes that occur within or between the previous day and the current day to the one or more accounts; and initiating a loading of the one or more output files to the target data store. - View Dependent Claims (2, 3, 4, 5, 6, 7, 8, 9, 10)
-
-
11. A system for transforming data, comprising:
-
one or more processors; and a memory coupled to the one or more processors, the memory storing instructions, which when executed by the one or more processors, cause the one or more processors to perform operations comprising; initiating an execution of a data pipeline for a set of components to perform one or more sequences of operations to transform data from one or more source data stores to a format for updating a target data store, wherein the data from the one or more source data stores corresponds to one or more accounts during a current day and a previous day, wherein each component is executable as an independent service and performs a predefined operation of the sequence, and wherein the execution of the data pipeline for the set of components includes executing; a first source component that performs operations including; reading the data associated with the current day from a first of the one or more source data stores; and converting the data associated with the current day to a first table data structure; a second source component that performs operations including; reading data from a second of the one or more source data stores; and converting the data to a second table data structure; and a third source component that performs operations including; reading the data associated with the previous day from a third of the one or more source data stores; and converting the data associated with the previous day to a second third table data structure; one or more transformation components that perform operations including; transforming the data of the first table data structure as part of a first sequence of transformation operations; transforming the data of the second table data structure as part of a second sequence of transformation operations; and transforming the data of the third data table structure as part of a third sequence of transformation operations, wherein the third sequence of transformation operations comprises; identifying updated data based upon a comparison of the data associated with the current day and the data associated with the previous day; combining the updated data with data corresponding to one or more new accounts to produce combined data; and joining the combined data with a copy of the data of the third table data structure to produce joined data; and one or more target components that perform operations including; writing one or more output files for updating target data store, wherein the one or more output files comprise; a first output file, based at least partially upon the updated data, that includes an updated list of the one or more accounts that is synchronized with changes to the one or more accounts within or between the previous day and the current day; and a second output file, based at least partially upon the joined data, that includes the changes that occur within or between the previous day and the current day to the one or more accounts; and initiating a loading of the one or more output files to the target data store. - View Dependent Claims (12, 13, 14)
-
-
15. A non-transitory computer-readable medium storing instructions which, when executed by one or more processors of a system, cause the system to perform operations comprising:
-
identifying one or more source data stores that provide data to be extracted by the system and loaded onto a target data store, wherein the data from the one or more source data stores corresponds to one or more accounts during a current day and a previous day; creating a transformation graph of a set of components, each component executable as an independent service and performing a predefined operation, the transformation graph representing one or more sequences of operations to transform data from the one or more source data stores to a format for updating the target data store; initiating an execution of the set of components based on the transformation graph, each component executing as a separate process, wherein the execution of the set of components includes executing; a first source component that performs operations including; reading the data associated with the current day from a first of the one or more source data stores; and converting the data associated with the current day to a first table data structure; a second source component that performs operations including; reading data from a second of the one or more source data stores; and converting the data to a second table data structure; and a third source component that performs operations including; reading the data associated with the previous day from a third of the one or more source data stores; and converting the data associated with the previous day to a third table data structure; one or more transformation components that perform operations including; transforming the data of the first table data structure as part of a first sequence of transformation operations; transforming the data of the second table data structure as part of a second sequence of transformation operations; and transforming the data of the third data table structure as part of a third sequence of transformation operations, wherein the third sequence of transformation operations comprises; identifying updated data based upon a comparison of the data associated with the current day and the data associated with the previous day; combining the updated data with data corresponding to one or more new accounts to produce combined data; and joining the combined data with a copy of the data of the third table data structure to produce joined data; and one or more target components that perform operations including; writing one or more output files for updating target data store, wherein the one or more output files comprise; a first output file that includes an updated list of the one or more accounts that is synchronized with changes to the one or more accounts within or between the previous day and the current day; and a second output file, based at least partially upon the joined data, that includes the changes that occur within or between the previous day and the current day to the one or more accounts; and initiating a loading of the one or more output files to the target data store. - View Dependent Claims (16, 17)
-
Specification