×

Universal data pipeline

  • US 9,946,738 B2
  • Filed: 10/06/2016
  • Issued: 04/17/2018
  • Est. Priority Date: 11/05/2014
  • Status: Active Grant
First Claim
Patent Images

1. A method comprising:

  • at one or more computing devices comprising one or more processors and one or more storage media storing one or more computer programs executed by the one or more processors to perform the method, performing operations comprising;

    maintaining a build catalog comprising a plurality of build catalog entries;

    wherein each build catalog entry, of the plurality of build catalog entries, comprises;

    an identifier of a version of a derived dataset corresponding to the build catalog entry,one or more dataset build dependencies of the version of the derived dataset corresponding to the build catalog entry, each of the one or more dataset build dependencies comprising an identifier of a version of a child dataset from which the version of the derived dataset corresponding to the build catalog entry is derived, anda derivation program build dependency of the version of the derived dataset corresponding to the build catalog entry, the derivation program build dependency comprising an identifier of a version of a derivation program executed to generate the version of the derived dataset corresponding to the build catalog entry;

    creating a new version of a particular derived dataset in context of a successful transaction;

    adding a new build catalog entry to the build catalog, the new build catalog entry comprising an identifier of the new version of the particular derived dataset, the identifier of the new version of the particular derived dataset being a transaction commit identifier assigned to the successful transaction;

    wherein the creating the new version of the particular derived dataset is based on executing a particular version of a particular derivation program;

    wherein the new build catalog entry comprises an identifier of the particular version of the particular derivation program;

    wherein the creating the new version of the particular derived dataset is based on providing one or more particular child dataset versions as input to the executing the particular version of the particular derivation program; and

    wherein the new build catalog entry comprises an identifier of each of the one or more particular child dataset versions.

View all claims
  • 7 Assignments
Timeline View
Assignment View
    ×
    ×