Techniques for configuring and validating a data pipeline deployment
First Claim
1. A method, comprising:
- receiving a template that defines a plurality of job definitions;
wherein each particular job definition of the plurality of job definitions corresponds to a particular data processing job, and wherein each particular job definition comprises;
a code identifier that identifies code for processing the particular data processing job;
a plurality of dataset dependency identifiers that identify a plurality of input datasets for the particular data processing job;
a plurality of configuration parameters for processing the particular data processing job;
for each particular job definition of the plurality of job definitions;
based on the template, causing to be displayed a user interface for receiving a plurality of configuration parameter values for the plurality of configuration parameters for the particular job definition;
receiving the plurality of configuration parameter values for the particular job definition;
executing the corresponding particular data processing job for the particular job definition by executing the code for processing the particular data processing job, by using the input datasets for the particular data processing job and the plurality of configuration parameter values;
in response to a command to perform a validation of a target data processing job that corresponds to a target job definition of the plurality of job definitions,executing the target data processing job for the target job definition by executing the code for processing the target data processing job, by using the input datasets for the target data processing job and the plurality of configuration parameter values for the target data processing job;
applying one or more validation criteria to the target data processing job to generate a validation value that indicates a metric of accuracy of the plurality of configuration parameter values for the target data processing job;
wherein the method is performed using one or more processors.
3 Assignments
0 Petitions
Accused Products
Abstract
Techniques for configuring and validating a data pipeline system deployment are described. In an embodiment, a template is a file or data object that describes a package of related jobs. For example, a template may describe a set of jobs necessary for deduplication of data records or a set of jobs performing machine learning on a set of data records. The template can be defined in a file, such as a JSON blob or XML file. For each job specified in the template, the template may identify a set of dataset dependencies that are needed as input for the processing of that job. For each job specified in the template, the template may further identify a set of configuration parameters needed for deployment of the job. In an embodiment, a server uses the template and the configuration parameter values collected via the GUI to generate code for the package of jobs. The code may be stored in a version control system. In an embodiment, the code may be compiled, executed, and deployed to a server for processing the data.
188 Citations
18 Claims
-
1. A method, comprising:
-
receiving a template that defines a plurality of job definitions; wherein each particular job definition of the plurality of job definitions corresponds to a particular data processing job, and wherein each particular job definition comprises; a code identifier that identifies code for processing the particular data processing job; a plurality of dataset dependency identifiers that identify a plurality of input datasets for the particular data processing job; a plurality of configuration parameters for processing the particular data processing job; for each particular job definition of the plurality of job definitions; based on the template, causing to be displayed a user interface for receiving a plurality of configuration parameter values for the plurality of configuration parameters for the particular job definition; receiving the plurality of configuration parameter values for the particular job definition; executing the corresponding particular data processing job for the particular job definition by executing the code for processing the particular data processing job, by using the input datasets for the particular data processing job and the plurality of configuration parameter values; in response to a command to perform a validation of a target data processing job that corresponds to a target job definition of the plurality of job definitions, executing the target data processing job for the target job definition by executing the code for processing the target data processing job, by using the input datasets for the target data processing job and the plurality of configuration parameter values for the target data processing job; applying one or more validation criteria to the target data processing job to generate a validation value that indicates a metric of accuracy of the plurality of configuration parameter values for the target data processing job; wherein the method is performed using one or more processors. - View Dependent Claims (2, 3, 4, 5, 6, 7, 8, 9)
-
-
10. One or more non-transitory computer-readable media storing instructions, wherein the instructions, when executed by one or more hardware processors, cause:
-
receiving a template that defines a plurality of job definitions; wherein each particular job definition of the plurality of job definitions corresponds to a particular data processing job, and wherein each particular job definition comprises; a code identifier that identifies code for processing the particular data processing job; a plurality of dataset dependency identifiers that identify a plurality of input datasets for the particular data processing job; a plurality of configuration parameters for processing the particular data processing job; for each particular job definition of the plurality of job definitions; based on the template, causing to be displayed a user interface for receiving a plurality of configuration parameter values for the plurality of configuration parameters for the particular job definition; receiving the plurality of configuration parameter values for the particular job definition; executing the corresponding particular data processing job for the particular job definition by executing the code for processing the particular data processing job, by using the input datasets for the particular data processing job and the plurality of configuration parameter values; in response to a command to perform a validation of a target data processing job that corresponds to a target job definition of the plurality of job definitions, executing the target data processing job for the target job definition by executing the code for processing the target data processing job, by using the input datasets for the target data processing job and the plurality of configuration parameter values for the target data processing job; applying one or more validation criteria to the target data processing job to generate a validation value that indicates a metric of accuracy of the plurality of configuration parameter values for the target data processing job. - View Dependent Claims (11, 12, 13, 14, 15, 16, 17, 18)
-
Specification