DISTRIBUTED BALANCED OPTIMIZATION FOR AN EXTRACT, TRANSFORM, AND LOAD (ETL) JOB
1 Assignment
0 Petitions
Accused Products
Abstract
Provided are techniques for distributed balanced optimization for an Extract, Transform, and Load (ETL) job across distributed systems of participating ETL servers. A data flow graph with links and stages for an ETL job to be executed by participating ETL servers is received. A distributed job execution plan is generated that breaks the data flow graph into job segments that each include a subset of the links and stages and map to one participating ETL server from the distributed systems to meet an optimization criteria across the distributed systems, wherein the distributed job execution plan utilizes statistics to reduce data movement and redundancies and to balance workloads across the distributed systems. Each of the job segment is distributed to the participating ETL servers based on the mappings for parallel execution.
18 Citations
24 Claims
-
1-8. -8. (canceled)
-
9. A computer program product for distributed balanced optimization of an Extract Transform Load (ETL) job across distributed systems of participating ETL servers, the computer program product comprising a computer readable storage medium having program code embodied therewith, the program code executable by at least one processor to perform:
-
receiving a data flow graph with links and stages for an ETL job to be executed by participating ETL servers; generating a distributed job execution plan that breaks the data flow graph into job segments that each include a subset of the links and stages and map to one participating ETL server from the distributed systems to meet an optimization criteria across the distributed systems, wherein the distributed job execution plan utilizes statistics to reduce data movement and redundancies and to balance workloads across the distributed systems; and distributing each of the job segments to the participating ETL servers based on the mappings for parallel execution. - View Dependent Claims (10, 11, 12, 13, 14, 15)
-
-
16. The computer program product of 9, wherein a Software as a Service (SaaS) is configured to perform the computer program product operations.
-
17. A computer system for distributed balanced optimization of an Extract Transform Load (ETL) job across distributed systems of participating ETL servers, comprising:
-
one or more processors, one or more computer-readable memories and one or more computer-readable, tangible storage devices; and program instructions, stored on at least one of the one or more computer-readable, tangible storage devices for execution by at least one of the one or more processors via at least one of the one or more memories, to perform; receiving a data flow graph with links and stages for an ETL job to be executed by participating ETL servers; generating a distributed job execution plan that breaks the data flow graph into job segments that each include a subset of the links and stages and map to one participating ETL server from the distributed systems to meet an optimization criteria across the distributed systems, wherein the distributed job execution plan utilizes statistics to reduce data movement and redundancies and to balance workloads across the distributed systems; and distributing each of the job segments to the participating ETL servers based on the mappings for parallel execution. - View Dependent Claims (18, 19, 20, 21, 22, 23, 24)
-
Specification