Performance checking component for an ETL job
First Claim
1. A computer-implemented method performed by a processor for generating a performance determination report for an Extract, Transform, Load (ETL) job, comprising:
- decomposing an ETL job into two or more stage instances, wherein the two or more stage instances include a first extraction stage instance and a second extraction stage instance;
identifying one or more conditions for each of the stage instances, wherein the one or more conditions include a network reliability condition for the first and second extraction stage instances;
generating a set of tests for each of the identified conditions, wherein the set of tests for the network reliability condition includes a ping test;
generating a first set of test results by performing the sets of tests;
determining a first range for a test result from the first set of test results, wherein determining the first range includes calculating one or more statistical metrics from two or more historical test results, wherein calculating the one or more statistical metrics includes calculating a standard deviation and a measure of dispersion from the two or more historical test results;
determining whether the test result from the first set of test results is outside of the first range, wherein the determining of whether the test result from the first set of test results is outside the first range includes comparing the test result with another test result for the same test performed at a second time, the second time being prior to a first time, wherein the generating of the first set of test results is performed at the first time, and wherein the same test performed is a temporary memory space test that includes determining the amount of volatile memory that is available at a compute node at the first and second times; and
generating the performance determination report, the performance determination report including the test result.
1 Assignment
0 Petitions
Accused Products
Abstract
Generation of a performance determination report for an Extract, Transform, Load (ETL) job includes decomposing the ETL job into two or more stage instances, and identifying one or more conditions for each of the stage instances. A set of tests for each of the identified conditions are generated. A first set of test results are generated by performing the set of tests. It is determined whether a test result from the first set of test results is outside of a first range. Conditions that can be identified include a non-volatile free memory condition, a network reliability condition, a network configuration condition, an application availability condition, a database availability condition, a database performance condition, a schema validity condition, an installed libraries condition, a configuration parameter condition, a volatile free memory condition, and a third party tool condition.
16 Citations
14 Claims
-
1. A computer-implemented method performed by a processor for generating a performance determination report for an Extract, Transform, Load (ETL) job, comprising:
-
decomposing an ETL job into two or more stage instances, wherein the two or more stage instances include a first extraction stage instance and a second extraction stage instance; identifying one or more conditions for each of the stage instances, wherein the one or more conditions include a network reliability condition for the first and second extraction stage instances; generating a set of tests for each of the identified conditions, wherein the set of tests for the network reliability condition includes a ping test; generating a first set of test results by performing the sets of tests; determining a first range for a test result from the first set of test results, wherein determining the first range includes calculating one or more statistical metrics from two or more historical test results, wherein calculating the one or more statistical metrics includes calculating a standard deviation and a measure of dispersion from the two or more historical test results; determining whether the test result from the first set of test results is outside of the first range, wherein the determining of whether the test result from the first set of test results is outside the first range includes comparing the test result with another test result for the same test performed at a second time, the second time being prior to a first time, wherein the generating of the first set of test results is performed at the first time, and wherein the same test performed is a temporary memory space test that includes determining the amount of volatile memory that is available at a compute node at the first and second times; and generating the performance determination report, the performance determination report including the test result. - View Dependent Claims (2, 3, 4, 5, 6)
-
-
7. A computer system comprising:
-
a processor; and a memory communicatively coupled with the processor, wherein the memory is encoded within instructions that when executed by the processor perform operations comprising; decomposing an ETL job into two or more stage instances, wherein the two or more stage instances include a first extraction stage instance and a second extraction stage instance; identifying one or more conditions for each of the stage instances, wherein the one or more conditions including a network reliability condition for the first and second extraction stage instances; generating a set of tests for each of the identified conditions, wherein the set of tests for the network reliability condition includes a ping test; generating a first set of test results by performing the sets of tests; determining a first range for a test result from the first set of test results, wherein determining the first range includes calculating one or more statistical metrics from two or more historical test results, wherein calculating the one or more statistical metrics includes calculating a standard deviation and a measure of dispersion from the two or more historical test results determining whether the test result from the first set of test results is outside the first range, wherein the determining of whether the test result from the first set of test results is outside the first range includes comparing the test result with another test result for the same test performed at a second time, the second time being prior to a first time, wherein the generating of the first set of test results is performed at the first time, and wherein the same test performed is a temporary memory space test that includes determining the amount of volatile memory that is available at a compute node at the first and second times; and generating the performance determination report, the performance determination report including the test result. - View Dependent Claims (8, 9, 10, 11)
-
-
12. A computer program product including a computer readable storage medium having computer readable program instructions stored thereon for causing a processor to perform the following operations to generate a performance determination report for an Extract, Transform, Load (ETL) job:
-
decomposing an ETL job into two or more stage instances, wherein the two or more stage instances include a first extraction stage instance and a second extraction stage instance, a first data validation stage instance and a second data validation stage instance, a transform stage instance, and a transfer stage instance; identifying one or more conditions for each of the stage instances, wherein the one or more conditions including a network reliability condition for the first and second extraction stage instances and for the transfer stage instance, the one or more conditions including a database performance condition for the first and second data validation stage instances, and the one or more conditions including a third party tool condition for the transform stage instance; generating a set of tests for each of the identified conditions, wherein the set of tests for the network reliability condition includes a ping test, the set of tests for the database performance condition includes a database query test, and the set of tests for the third party tool condition includes a test to validate transformed data values; generating a first set of test results by performing the sets of tests; determining a first range for a test result from the first set of test results, wherein determining the first range includes calculating one or more statistical metrics from two or more historical test results, wherein calculating the one or more statistical metrics includes calculating a standard deviation and a measure of dispersion from the two or more historical test results; determining whether the test result from the first set of test results is outside the first range, wherein the determining of whether the test result from the first set of test results is outside the first range includes comparing the test result with another test result for the same test performed at a second time, the second time being prior to a first time, wherein the generating of the first set of test results is performed at a first time, and wherein the same test performed is a temporary memory space test that includes determining the amount of volatile memory that is available at a compute node at the first and second times; and generating the performance determination report, the performance determination report including the test result. - View Dependent Claims (13, 14)
-
Specification