DISTRIBUTED BALANCED OPTIMIZATION FOR AN EXTRACT, TRANSFORM, AND LOAD (ETL) JOB
First Claim
1. A method for distributed balanced optimization of an Extract Transform Load (ETL) job across distributed systems of participating ETL servers, comprising:
- receiving a data flow graph with links and stages for an ETL job to be executed by participating ETL servers;
generating a distributed job execution plan that breaks the data flow graph into job segments that each include a subset of the links and stages and map to one participating ETL server from the distributed systems to meet an optimization criteria across the distributed systems, wherein the distributed job execution plan utilizes statistics to reduce data movement and redundancies and to balance workloads across the distributed systems; and
distributing each of the job segments to the participating ETL servers based on the mappings for parallel execution.
1 Assignment
0 Petitions
Accused Products
Abstract
Provided are techniques for distributed balanced optimization for an Extract, Transform, and Load (ETL) job across distributed systems of participating ETL servers. A data flow graph with links and stages for an ETL job to be executed by participating ETL servers is received. A distributed job execution plan is generated that breaks the data flow graph into job segments that each include a subset of the links and stages and map to one participating ETL server from the distributed systems to meet an optimization criteria across the distributed systems, wherein the distributed job execution plan utilizes statistics to reduce data movement and redundancies and to balance workloads across the distributed systems. Each of the job segment is distributed to the participating ETL servers based on the mappings for parallel execution.
20 Citations
9 Claims
-
1. A method for distributed balanced optimization of an Extract Transform Load (ETL) job across distributed systems of participating ETL servers, comprising:
-
receiving a data flow graph with links and stages for an ETL job to be executed by participating ETL servers; generating a distributed job execution plan that breaks the data flow graph into job segments that each include a subset of the links and stages and map to one participating ETL server from the distributed systems to meet an optimization criteria across the distributed systems, wherein the distributed job execution plan utilizes statistics to reduce data movement and redundancies and to balance workloads across the distributed systems; and distributing each of the job segments to the participating ETL servers based on the mappings for parallel execution. - View Dependent Claims (2, 3, 4, 5, 6, 7, 8)
-
-
9-20. -20. (canceled)
Specification