Performing quality determination of data
First Claim
Patent Images
1. A method of performing data quality assurance, comprising:
- receiving data values from a data source at discrete time points up to and including time point t;
computing at least one estimated value based on at least some of the received data values;
applying the received data values and the at least one estimated value to one of a first algorithm and a second algorithm for detecting changes in the data values, wherein the at least one estimated data value comprises predicted data values for the first algorithm, and the at least one estimated data value comprises a mean of the at least some received data values for the second algorithm, wherein the first algorithm computes cumulative sums based on the received data values and predicted data values, and the second algorithm computes a probability distribution function based on the received data values and the mean;
performing a data quality determination of the data value for time point t based on one of the first and second algorithms; and
providing an indication of the data quality determination for output by an output device.
3 Assignments
0 Petitions
Accused Products
Abstract
To perform data quality assurance, data values from a data source at discrete time points up to time point t are received. At least one estimated value is computed based on at least some of the received data values, and the received data values and estimated data values are applied to an algorithm. A data quality determination of the data value for time point t is performed based on the algorithm.
35 Citations
24 Claims
-
1. A method of performing data quality assurance, comprising:
-
receiving data values from a data source at discrete time points up to and including time point t; computing at least one estimated value based on at least some of the received data values; applying the received data values and the at least one estimated value to one of a first algorithm and a second algorithm for detecting changes in the data values, wherein the at least one estimated data value comprises predicted data values for the first algorithm, and the at least one estimated data value comprises a mean of the at least some received data values for the second algorithm, wherein the first algorithm computes cumulative sums based on the received data values and predicted data values, and the second algorithm computes a probability distribution function based on the received data values and the mean; performing a data quality determination of the data value for time point t based on one of the first and second algorithms; and providing an indication of the data quality determination for output by an output device. - View Dependent Claims (2, 3, 4, 5, 6, 7, 8, 9, 10)
-
-
11. An article comprising at least one storage medium containing instructions that when executed cause a processor to:
-
receive data values from a data source at discrete time points; compute predicted values for a forecast period based on a historical data set and at least some of the received data values; apply the predicted values to a cumulative sums algorithm for detecting a data quality problem in the received data values; provide an indication of the data quality problem for output by an output device, wherein detecting the data quality problem in the received data values using the cumulative sums algorithm comprises; computing residual values based on the received data values and the predicted data values; calculating aggregate values derived from aggregating the residual values; and comparing the aggregate values against at least one threshold to detect a change point in the received data values, the change point being one of the discrete time points, and indicating a data quality problem with the data value at the change point, wherein calculating the aggregate values comprises calculating cumulative sums of the residual values, and wherein comparing the aggregate values comprises comparing the cumulative sums against the at least one threshold; generate a predictive model based on the historical data set and at least some of the received data values; wherein computing the predicted data values comprises computing the predicted data values using the predictive model, wherein detecting the data quality problem in the received data values using the cumulative sums algorithm further comprises; calculating centered residual values by subtracting an average residual value from corresponding computed residual values, wherein calculating the cumulative sums comprises calculating the cumulative sums of the centered residual values. - View Dependent Claims (12)
-
-
13. An article comprising at least one storage medium containing instructions that when executed cause a processor to:
-
receive data values from a data source at discrete time points; compute predicted values for a forecast period based on a historical data set and at least some of the received data values; applying the predicted values to a cumulative sums algorithm for detecting a data quality problem in the received data values; provide an indication of the data quality problem for output by an output device, wherein computing the predicted values comprises computing a first set of predicted values using a first predictive model, and wherein the first set of predicted values is applied to the cumulative sums algorithm for detecting the data quality problem, wherein the instructions when executed cause the processor to further; generate a first result based on applying the first set of predicted values to the cumulative sums algorithm; compute a second set of predicted values using a second predictive model; apply the second set of predicted values to the cumulative sums algorithm for detecting the data quality problem in the received data values; and generate a second result based on applying the second set of predicted values to the cumulative sums algorithm. - View Dependent Claims (14, 15)
-
-
16. A system comprising:
-
a processor; and a detection module executable on the processor to; receive a time series of data values; calculate at least one estimated value based on at least some of the received data values; detect a change in the received data values with one of a first algorithm and a second algorithm that uses the received data values and the at least one estimated value, wherein the at least one estimated data value comprises predicted data values for the first algorithm, and the at least one estimated data value comprises a mean of the at least some received data values for the second algorithm, wherein the first algorithm computes cumulative sums based on the received data values and predicted data values, and the second algorithm computes a probability distribution function based on the received data values and the mean; and indicate a data quality problem in the received data values based on the change. - View Dependent Claims (17, 18, 19, 20, 21)
-
-
22. A system comprising:
-
means for receiving data values from a data source at discrete time points up to time point t; means for computing at least one estimated value based on at least some of the received data values; means for applying the received data values and at least one estimated value to one of a first algorithm and a second algorithm for detecting a systematic change in the data values, wherein the at least one estimated data value comprises predicted data values for the first algorithm, and the at least one estimated data value comprises a mean of the at least some received data values for the second algorithm, wherein the first algorithm computes cumulative sums based on the received data values and predicted data values, and the second algorithm computes a probability distribution function based on the received data values and the mean; means for performing a data quality determination of the data value for time point t based on one of the first and second algorithms for detecting the systematic change in the data values; and means for providing an indication of the data quality determination for output by an output device. - View Dependent Claims (23)
-
-
24. A method comprising:
-
storing a historical data set; receiving input data values at discrete time points; calculating predicted data values based on the historical data set and at least some of the received input data values; computing residual values calculated from differences between the predicted data values and received data values; determining a data quality problem in a data value at one of the discrete time points with a cumulative sums algorithm that uses the residual values; and providing an indication of the data quality problem for output by an output device, wherein the predicted data values comprise a first set of predicted data values computed using a first predictive model, the method further comprising; generating a first result based on applying the first set of predicted values to the cumulative sums algorithm; computing a second set of predicted data values using a second predictive model; applying the second set of predicted data values to the cumulative sums algorithm for determining the data quality problem; generating a second result based on applying the second set of predicted data values to the cumulative sums algorithm; determining convergence of the first result and second result; producing a first output in response to determining that the first result and second result have converged; and producing a second output in response to determining that the first result and second result have not converged.
-
Specification