Data integrity checks
First Claim
1. A method comprising:
- ingesting data through a data pipeline, the data pipeline located between a source database and a local database, the data comprising a plurality of data elements;
identifying a first data type associated with one or more data elements of the plurality of data elements;
identifying a subtype of the first data type;
retrieving a threshold value of the first data type and the subtype, the retrieving the threshold value including;
accessing historical data based on the source database of the ingested data, the historical data comprising a plurality of historical data elements;
determining the threshold value based on the historical data elements of the historical data;
calculating a count of data elements of the identified subtype within the ingested data;
detecting a difference between the threshold value and the count of the data elements of the identified subtype within the ingested data;
generating a human-readable report, the human-readable report including the subtype of each data element and the count of each subtype; and
causing display of the human-readable report at a client device, the human-readable report including a presentation of the count of the data elements of the identified subtype within the ingested data, the threshold value, and the difference between the threshold value and the count of the data elements.
8 Assignments
0 Petitions
Accused Products
Abstract
Aspects of the present disclosure relate to performing agnostic data integrity checks on source data, and based on the data integrity checks, generating a human-readable report that may be useable to identify specific errors or anomalies within the source data. Example embodiments involve systems and methods for performing the data integrity checks and generating the human-readable reports. For example, the method may include operations to ingest data from a source database through a data pipeline and into a local database, access the data from the data pipeline, determine a data type of the data, determine subtypes of data elements which make up the data, determine a count of each subtype, and generate a human-readable report, to be displayed at a client device.
134 Citations
20 Claims
-
1. A method comprising:
-
ingesting data through a data pipeline, the data pipeline located between a source database and a local database, the data comprising a plurality of data elements; identifying a first data type associated with one or more data elements of the plurality of data elements; identifying a subtype of the first data type; retrieving a threshold value of the first data type and the subtype, the retrieving the threshold value including; accessing historical data based on the source database of the ingested data, the historical data comprising a plurality of historical data elements; determining the threshold value based on the historical data elements of the historical data; calculating a count of data elements of the identified subtype within the ingested data; detecting a difference between the threshold value and the count of the data elements of the identified subtype within the ingested data; generating a human-readable report, the human-readable report including the subtype of each data element and the count of each subtype; and causing display of the human-readable report at a client device, the human-readable report including a presentation of the count of the data elements of the identified subtype within the ingested data, the threshold value, and the difference between the threshold value and the count of the data elements. - View Dependent Claims (2, 3, 4, 5, 6, 7, 8)
-
-
9. A non-transitory machine-readable storage medium comprising instructions that, when executed by one or more processors of a machine, cause the machine to perform operations comprising:
-
ingesting data through a data pipeline, the data pipeline located between a source database and a local database, the data comprising a plurality of data elements; identifying a first data type associated with one or more data elements of the plurality of data elements; identifying a subtype of the first data type; retrieving a threshold value of the first data type and the subtype, the retrieving the threshold value including; accessing historical data based on the source database of the ingested data, the historical data comprising a plurality of historical data elements; determining the threshold value based on the historical data elements of the historical data; calculating a count of data elements of the identified subtype within the ingested data; detecting a difference between the threshold value and the count of the data elements of the identified subtype within the ingested data; generating a human-readable report, the human-readable report including the subtype of each data element and the count of each subtype; causing display of the human-readable report at a client device, the human-readable report including a presentation of the count of the data elements of the identified subtype within the ingested data, the threshold value, and the difference between the threshold value and the count of the data elements. - View Dependent Claims (10, 11, 12, 13, 14, 15, 16)
-
-
17. A system comprising:
-
processors; and a memory storing instructions that, when executed by at least one processor among the processors, cause the system to perform operations comprising; ingesting data through a data pipeline, the data pipeline located between a source database and a local database, the data comprising a plurality of data elements; identifying a first data type associated with one or more data elements of the plurality of data elements; identifying a subtype of the first data type; retrieving a threshold value of the first data type and the subtype, the retrieving the threshold value including; accessing historical data based on the source database of the ingested data, the historical data comprising a plurality of historical data elements; determining the threshold value based on the historical data elements of the historical data; calculating a count of data elements of the identified subtype within the ingested data; detecting a difference between the threshold value and the count of the data elements of the identified subtype within the ingested data; generating a human-readable report, the human-readable report including the subtype of each data element and the count of each subtype; and causing display of the human-readable report at a client device, the human-readable report including a presentation of the count of the data elements of the identified subtype within the ingested data, the threshold value, and the difference between the threshold value and the count of the data elements. - View Dependent Claims (18, 19, 20)
-
Specification