×

Distributed balanced optimization for an extract, transform, and load (ETL) job

  • US 10,120,918 B2
  • Filed: 06/07/2016
  • Issued: 11/06/2018
  • Est. Priority Date: 04/24/2015
  • Status: Active Grant
First Claim
Patent Images

1. A method for distributed balanced optimization of an Extract Transform Load (ETL) job across distributed systems of participating ETL servers, comprising:

  • receiving a data flow graph with links and stages for an ETL job to be executed by participating ETL servers wherein the stages include multiple source system stages and one or more partition aggregation stages, wherein the multiple source system stages account for distributed table partitions for a complete data extraction of tables, and wherein the one or more partition aggregation stages combine data extracted from the distributed table partitions;

    generating multiple distributed job execution plans for the data flow graph based on data source mappings that indicate which participating ETL servers have access to which of the tables in particular data sources and link mappings that indicate one or more networks associated with the participating ETL servers for each of the links;

    selecting a distributed job execution plan from the multiple distributed job execution plans that meets an optimization criteria, wherein the distributed job execution plan breaks the data flow graph into job segments that each include a subset of the links and stages and map to a different participating ETL server from the participating ETL servers, wherein the distributed job execution plan utilizes statistics to reduce data movement and redundancies and to balance workloads across the distributed systems; and

    distributing each of the job segments to the participating ETL servers based on the mappings for parallel execution.

View all claims
  • 1 Assignment
Timeline View
Assignment View
    ×
    ×