Method and architecture for automated optimization of ETL throughput in data warehousing applications
DCFirst Claim
1. A computer implemented method for transforming data in a data warehousing application, comprising the steps of:
- specifying at least one source containing data;
constructing a plurality of transformation components for manipulating data according to pre-determined sets of rules;
coupling the transformation components to form one or more pipelines;
specifying a target for storing data generated by one or more of the pipelines;
staging data in a first of said plurality of transformation components; and
streaming data in a second of said plurality of transformation components, wherein said staging and said streaming of data are performed automatically by software without human intervention.
5 Assignments
Litigations
0 Petitions
Accused Products
Abstract
A computer software architecture to automatically optimize the throughput of the data extraction/transformation/loading (ETL) process in data warehousing applications. This architecture has a componentized aspect and a pipeline-based aspect. The componentized aspect refers to the fact that every transformation used in this architecture is built up with transformation components selected from an extensible set of transformation components. Besides simplifying source code maintenance and adjustment for the data warehouse users, these transformation components also provide these users the building blocks to effectively construct pertinent and functionally sophisticated transformations in a pipelined manner. Within a pipeline, each transformation component automatically stages or streams its data to optimize ETL throughput. Furthermore, each transformation either pushes data to another transformation component, pulls data from another transformation component, or performs a push/pull operation on the data. Thereby, the pipelining; staging/streaming; and pushing/pulling features of the transformation components effectively optimizes the throughput of the ETL process.
-
Citations
16 Claims
-
1. A computer implemented method for transforming data in a data warehousing application, comprising the steps of:
-
specifying at least one source containing data;
constructing a plurality of transformation components for manipulating data according to pre-determined sets of rules;
coupling the transformation components to form one or more pipelines;
specifying a target for storing data generated by one or more of the pipelines;
staging data in a first of said plurality of transformation components; and
streaming data in a second of said plurality of transformation components, wherein said staging and said streaming of data are performed automatically by software without human intervention. - View Dependent Claims (2, 3, 4, 5, 6, 7, 8)
-
-
9. A computer-readable medium having stored thereon instructions for causing a computer to transform data in a datamart application, comprising:
-
a source containing original data;
a plurality of transformation components for manipulating data according to pre-determined behaviors;
a mapping which specifies an order for coupling the transformation components to form one or more pipelines;
a target for storing data generated by one or more of the pipelines;
memory for staging said data generated by a first of said plurality of transformation components;
a second of said plurality of transformation components operable to stream said data generated by said second of said plurality of transformation components; and
instructions for automatically staging or streaming of data by each of the plurality of transformation components. - View Dependent Claims (10, 11, 12, 13, 14, 15, 16)
-
Specification