×

Dynamically performing data processing in a data pipeline system

  • US 10,176,217 B1
  • Filed: 09/07/2017
  • Issued: 01/08/2019
  • Est. Priority Date: 07/06/2017
  • Status: Active Grant
First Claim
Patent Images

1. A computer-implemented method comprising:

  • in association with a distributed data processing system that implements one or more data transformation pipelines, each of the data transformation pipelines comprising at least a first dataset, a first transformation, a second derived dataset and dataset dependency and timing metadata, detecting an arrival of a new raw dataset or new derived dataset;

    in response to the detecting, obtaining from the dataset dependency and timing metadata a dataset subset comprising those datasets that depend on at least the new raw dataset or new derived dataset;

    for each member dataset in the dataset subset, determining if the member dataset has a dependency on any other dataset that is not yet arrived, and in response to determining that the member dataset does not have a dependency on any other dataset that is not yet arrived;

    initiating a build of a portion of the data transformation pipeline comprising the member dataset and all other datasets on which the member dataset is dependent, without waiting for arrival of other datasets;

    detecting that a cutoff time has occurred, and in response thereto;

    determining that a particular dataset on which the second derived dataset depends has not arrived;

    in response thereto, initiating build operations for all other portions or derived datasets of the data transformation pipeline that have not yet been built but excluding the other portions or derived datasets that depend upon the particular dataset;

    wherein the method is performed using one or more processors.

View all claims
  • 8 Assignments
Timeline View
Assignment View
    ×
    ×