PLUGGABLE FAULT DETECTION TESTS FOR DATA PIPELINES
First Claim
1. A method for detecting faults related to a data pipeline system, the method comprising:
- at one or more computing devices comprising one or more processors and memory storing one or more computer programs executed by the one or more processors to perform the method, performing operations comprising;
receiving a plugin comprising one or more instructions representing a test to perform on the data pipeline system and one or more configuration points;
wherein the data pipeline system is configured to receive source data from one or more data sources and configured to apply one or more transformations to the source data to produce transformed data before storage of the transformed data in one or more data sinks;
receiving, via a first graphical user interface, one or more settings corresponding to the one or more configuration points;
receiving test data from the data pipeline system, wherein the test data comprises a metric reflecting an amount of the transformed data after the one or more transformations;
determining to run the test defined by the plugin on the data pipeline system including executing the one or more instructions of the plugin based on the one or more settings for the one or more configuration points and the test data, wherein a result of executing the one or more instructions includes at least a test result status indicator;
wherein the test result status indicator is based, at least in part, on the result of executing the one or more instructions including determining whether the amount of the transformed data is below a threshold amount of data; and
causing display of a second graphical user interface that presents at least the test result status indicator.
7 Assignments
0 Petitions
Accused Products
Abstract
Discussed herein are embodiments of methods and systems which allow engineers or administrators to create modular plugins which represent the logic for various fault detection tests that can be performed on data pipelines and shared among different software deployments. In some cases, the modular plugins each define a particular test to be executed against data received from the pipeline in addition to one or more configuration points. The configuration points represent configurable arguments, such as variables and/or functions, referenced by the instructions which implement the tests and that can be set according to the specific operation environment of the monitored pipeline.
-
Citations
20 Claims
-
1. A method for detecting faults related to a data pipeline system, the method comprising:
-
at one or more computing devices comprising one or more processors and memory storing one or more computer programs executed by the one or more processors to perform the method, performing operations comprising; receiving a plugin comprising one or more instructions representing a test to perform on the data pipeline system and one or more configuration points; wherein the data pipeline system is configured to receive source data from one or more data sources and configured to apply one or more transformations to the source data to produce transformed data before storage of the transformed data in one or more data sinks; receiving, via a first graphical user interface, one or more settings corresponding to the one or more configuration points; receiving test data from the data pipeline system, wherein the test data comprises a metric reflecting an amount of the transformed data after the one or more transformations; determining to run the test defined by the plugin on the data pipeline system including executing the one or more instructions of the plugin based on the one or more settings for the one or more configuration points and the test data, wherein a result of executing the one or more instructions includes at least a test result status indicator; wherein the test result status indicator is based, at least in part, on the result of executing the one or more instructions including determining whether the amount of the transformed data is below a threshold amount of data; and causing display of a second graphical user interface that presents at least the test result status indicator. - View Dependent Claims (2)
-
-
3. A fault detection system for detecting faults related to a data pipeline system, the fault detection system comprising:
-
storage media; one or more processors; and one or more programs stored in the storage media and configured for execution by the one or more processors, the one or more programs comprising instructions for; receiving a plugin comprising a) one or more instructions representing a test to perform on data processed by the data pipeline system and b) one or more configuration points, wherein the data pipeline system is configured to receive source data from one or more data sources and configured to apply one or more transformations to the source data to produce transformed data before storage of the transformed data in one or more data sinks; and receiving, via a first graphical user interface, one or more settings corresponding to the one or more configuration points; receiving test data from the data pipeline system, wherein the test data comprises a sample of the transformed data after the one or more transformations; determining to run the test defined by the plugin on the data pipeline system including executing the one or more instructions of the plugin based on the one or more settings for the one or more configuration points and the test data, wherein a result of executing the one or more instructions includes at least a test result status indicator; wherein the transformed data comprises tabular data; wherein the sample comprises a portion of the tabular data; wherein the test result status indicator is based, at least in part, on the result of executing the one or more instructions including determining; (a) whether the sample contains a correct number of columns according to a schema for the transformed data, (b) whether data in each column of the sample adheres to a data type of the column as specified in a schema for the transformed data, (c) whether data in each column of the sample improperly contains NULL values according to a schema for the transformed data, or any combination of (a), (b), or (c); and causing display of a second graphical user interface that visibly presents at least the test result status indicator. - View Dependent Claims (4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20)
-
Specification