Universal data pipeline
First Claim
Patent Images
1. A method comprising:
- at one or more computing devices comprising one or more processors and one or more storage media storing one or more computer programs executed by the one or more processors to perform the method, performing operations comprising;
maintaining a build catalog comprising a plurality of build catalog entries, each build catalog entry comprisingan identifier of a version of a derived dataset corresponding to the build catalog entry,one or more dataset build dependencies of the version of the derived dataset corresponding to the build catalog entry, each of the one or more dataset build dependencies comprising an identifier of a version of a child dataset from which the version of the derived dataset corresponding to the build catalog entry is derived, anda derivation program build dependency that is executable to generate the version of the derived dataset corresponding to the build catalog entry;
creating a new version of a particular derived dataset by executing a particular version of a particular derivation program; and
adding a new build catalog entry to the build catalog, the new build catalog entry comprising an identifier of the new version of the particular derived dataset, an identifier of the particular version of the particular derivation program, and at least one identifier of one or more particular child dataset versions that were provided as input to the particular derivation program.
7 Assignments
0 Petitions
Accused Products
Abstract
A history preserving data pipeline computer system and method. In one aspect, the history preserving data pipeline system provides immutable and versioned datasets. Because datasets are immutable and versioned, the system makes it possible to determine the data in a dataset at a point in time in the past, even if that data is no longer in the current version of the dataset.
764 Citations
20 Claims
-
1. A method comprising:
-
at one or more computing devices comprising one or more processors and one or more storage media storing one or more computer programs executed by the one or more processors to perform the method, performing operations comprising; maintaining a build catalog comprising a plurality of build catalog entries, each build catalog entry comprising an identifier of a version of a derived dataset corresponding to the build catalog entry, one or more dataset build dependencies of the version of the derived dataset corresponding to the build catalog entry, each of the one or more dataset build dependencies comprising an identifier of a version of a child dataset from which the version of the derived dataset corresponding to the build catalog entry is derived, and a derivation program build dependency that is executable to generate the version of the derived dataset corresponding to the build catalog entry; creating a new version of a particular derived dataset by executing a particular version of a particular derivation program; and adding a new build catalog entry to the build catalog, the new build catalog entry comprising an identifier of the new version of the particular derived dataset, an identifier of the particular version of the particular derivation program, and at least one identifier of one or more particular child dataset versions that were provided as input to the particular derivation program. - View Dependent Claims (2, 3, 4, 5, 6, 7, 8, 9, 10)
-
-
11. A computer system comprising:
-
one or more hardware processors; one or more computer programs; and one or more storage media storing the one or more computer programs for execution by the one or more hardware processors, the one or more computer programs comprising instructions for performing operations comprising; maintaining a build catalog comprising a plurality of build catalog entries, each build catalog entry comprising an identifier of a version of a derived dataset corresponding to the build catalog entry, one or more dataset build dependencies of the version of the derived dataset corresponding to the build catalog entry, each of the one or more dataset build dependencies comprising an identifier of a version of a child dataset from which the version of the derived dataset corresponding to the build catalog entry is derived, and a derivation program build dependency that is executable to generate the version of the derived dataset corresponding to the build catalog entry; creating a new version of a particular derived dataset by executing a particular version of a particular derivation program; and adding a new build catalog entry to the build catalog, the new build catalog entry comprising an identifier of the new version of the particular derived dataset, an identifier of the particular version of the particular derivation program, and at least one identifier of one or more particular child dataset versions that were provided as input to the particular derivation program. - View Dependent Claims (12, 13, 14, 15, 16, 17, 18, 19, 20)
-
Specification