System and method for automating ETL application
First Claim
Patent Images
1. A computer-implemented method comprising:
- selecting a plurality of candidate jobs to execute in a current job run, wherein the candidate jobs comprise pending jobs from a previous run that have not been completed due to an error condition and that are to be executed again, or new jobs in which execution has not yet been attempted, and wherein each candidate job performs an extracting, transforming or loading function for populating a database;
storing, in a job status metadata table that stores information about each job'"'"'s status when the previous run was completed in a STATUS field and that further stores information specifying a priority level for each job in a PRIORITY field, a new session identifier value in a SESSIONID field;
selecting a subset of the candidate jobs to execute in the current run;
selecting a first job of the subset of candidate jobs to execute based on parent job-child job relationship information stored for the first job in a parent job metadata table, and further based on the priority level stored for the first job in the job status metadata table;
determining that the first job does not actually exist within a particular repository or that the first job is not compiled; and
based on determining that the first job does not exist within a particular repository or that the first job is not compiled, updating, by one or more computers, the STATUS field for the job in the job status metadata table to reflect that the job has not been completed due to an error condition, and halting execution of all remaining jobs in the subset.
2 Assignments
0 Petitions
Accused Products
Abstract
Novel tools for development and operation of ETL (Extract Transform Load) systems for populating databases. An embodiment uses metadata tables to describe relationships between jobs to run for processing data. These relationships can include parent-child job relation, and priority. These tools create a DataStage Service Architecture (DSA) that helps automate and control the ETL process. Other tools allow developers to easily view and update the ETL process.
52 Citations
20 Claims
-
1. A computer-implemented method comprising:
-
selecting a plurality of candidate jobs to execute in a current job run, wherein the candidate jobs comprise pending jobs from a previous run that have not been completed due to an error condition and that are to be executed again, or new jobs in which execution has not yet been attempted, and wherein each candidate job performs an extracting, transforming or loading function for populating a database; storing, in a job status metadata table that stores information about each job'"'"'s status when the previous run was completed in a STATUS field and that further stores information specifying a priority level for each job in a PRIORITY field, a new session identifier value in a SESSIONID field; selecting a subset of the candidate jobs to execute in the current run; selecting a first job of the subset of candidate jobs to execute based on parent job-child job relationship information stored for the first job in a parent job metadata table, and further based on the priority level stored for the first job in the job status metadata table; determining that the first job does not actually exist within a particular repository or that the first job is not compiled; and based on determining that the first job does not exist within a particular repository or that the first job is not compiled, updating, by one or more computers, the STATUS field for the job in the job status metadata table to reflect that the job has not been completed due to an error condition, and halting execution of all remaining jobs in the subset. - View Dependent Claims (2, 3, 4, 5, 6, 7)
-
-
8. A system comprising:
-
one or more computers; and a computer-readable medium coupled to the one or more computers having instructions stored thereon which, when executed by the one or more computers, cause the one or more computers to perform operations comprising; selecting a plurality of candidate jobs to execute in a current job run, wherein the candidate jobs comprise pending jobs from a previous run that have not been completed due to an error condition and that are to be executed again, or new jobs in which execution has not yet been attempted, and wherein each candidate job performs an extracting, transforming or loading function for populating a database, storing, in a job status metadata table that stores information about each job'"'"'s status when the previous run was completed in a STATUS field and that further stores information specifying a priority level for each job in a PRIORITY field, a new session identifier value in a SESSIONID field, selecting a subset of the candidate jobs to execute in the current run, selecting a first job of the subset to execute based on parent job-child job relationship information stored for the first job in a parent job metadata table, and further based on the priority level stored for the first job in the job status metadata table, determining that the first job does not actually exist within a particular repository or that the first job is not compiled, and based on determining that the first job does not exist within a particular repository or that the first job is not compiled, updating the STATUS field for the job in the job status metadata table to reflect that the job has not been completed due to an error condition, and halting execution of all remaining jobs in the subset. - View Dependent Claims (9, 10, 11, 12, 13, 14)
-
-
15. A computer storage medium encoded with a computer program, the program comprising instructions that when executed by one or more computers cause the one or more computers to perform operations comprising:
-
selecting a plurality of candidate jobs to execute in a current job run, wherein the candidate jobs comprise pending jobs from a previous run that have not been completed due to an error condition and that are to be executed again, or new jobs in which execution has not yet been attempted, and wherein each candidate job performs an extracting, transforming or loading function for populating a database; storing, in a job status metadata table that stores information about each job'"'"'s status when the previous run was completed in a STATUS field and that further stores information specifying a priority level for each job in a PRIORITY field, a new session identifier value in a SESSIONID field; selecting a subset of the candidate jobs to execute in the current run; selecting a first job of the subset to execute based on parent job-child job relationship information stored for the first job in a parent job metadata table, and further based on the priority level stored for the first job in the job status metadata table; determining that the first job does not actually exist within a particular repository or that the first job is not compiled; and based on determining that the first job does not exist within a particular repository or that the first job is not compiled, updating the STATUS field for the job in the job status metadata table to reflect that the job has not been completed due to an error condition, and halting execution of all remaining jobs in the subset. - View Dependent Claims (16, 17, 18, 19, 20)
-
Specification