Method and system for pre-processing data using the mahalanobis distance (MD)
First Claim
Patent Images
1. A computer-implemented method for pre-processing data, comprising:
- detecting one or more erroneous vectors in a plurality of vectors;
detecting one or more erroneous elements in the one or more erroneous vectors;
deleting values of the one or more detected erroneous elements from the one or more erroneous vectors;
identifying one or more surrounding vectors surrounding the one or more erroneous vectors with respect to time;
determining a surrounding vector distance metric based on one or more elements of the one or more surrounding vectors corresponding to the one or more erroneous elements of the one or more erroneous vectors;
calculating one or more new values for the one or more erroneous elements such that an erroneous vector distance metric, determined based on at least one or more remaining values of the one or more erroneous vectors, corresponds to the surrounding vector distance metric; and
replacing the deleted values of the one or more erroneous elements with the calculated one or more new values for the one or more erroneous elements of the one or more erroneous vectors.
1 Assignment
0 Petitions
Accused Products
Abstract
A computer-implemented method for pre-processing data. The method may include detecting one or more erroneous vectors in a plurality of vectors, detecting one or more erroneous elements in the one or more erroneous vectors, and deleting the detected one or more erroneous elements. The method may also include detecting one or more missing elements in the plurality of vectors. Further, the method may include populating one or more offending vectors that include one or more missing elements and/or deleted erroneous elements with one or more elements that are based on a distance metric.
24 Citations
21 Claims
-
1. A computer-implemented method for pre-processing data, comprising:
-
detecting one or more erroneous vectors in a plurality of vectors; detecting one or more erroneous elements in the one or more erroneous vectors; deleting values of the one or more detected erroneous elements from the one or more erroneous vectors; identifying one or more surrounding vectors surrounding the one or more erroneous vectors with respect to time; determining a surrounding vector distance metric based on one or more elements of the one or more surrounding vectors corresponding to the one or more erroneous elements of the one or more erroneous vectors; calculating one or more new values for the one or more erroneous elements such that an erroneous vector distance metric, determined based on at least one or more remaining values of the one or more erroneous vectors, corresponds to the surrounding vector distance metric; and replacing the deleted values of the one or more erroneous elements with the calculated one or more new values for the one or more erroneous elements of the one or more erroneous vectors. - View Dependent Claims (2, 3, 4, 5, 6, 7, 8)
-
-
9. A non-transitory computer-readable medium for use on a computer system configured to perform pre-processing of data, the non-transitory computer-readable medium having computer-executable instructions for performing a method comprising:
-
detecting one or more erroneous vectors in a plurality of vectors; detecting one or more erroneous elements in the one or more erroneous vectors;
deleting values of the one or more detected erroneous elements from the one or more erroneous vectors;
identifying one or more surrounding vectors surrounding the one or more erroneous vectors with respect to time;determining a surrounding vector distance metric based on one or more elements of the one or more surrounding vectors corresponding to the one or more erroneous elements of the one or more erroneous vectors; calculating one or more new values for the one or more erroneous elements such that an erroneous vector distance metric, determined based on at least one or more remaining values of the one or more erroneous vectors, corresponds to the surrounding vector distance metric; and replacing the deleted values of the one or more erroneous elements with the calculated one or more new values for the one or more erroneous elements of the one or more erroneous vectors. - View Dependent Claims (10, 11, 12, 13, 14, 15, 16)
-
-
17. A computer system, comprising:
-
at least one input/output interface; a database configured to store information relevant to a virtual sensor process model; and a processor configured to; obtain a set of data records corresponding to a plurality of vectors; detect one or more erroneous vectors in the plurality of vectors; detect one or more erroneous elements in the one or more erroneous vectors; delete values of the detected one or more erroneous elements from the one or more erroneous vectors; identify one or more surrounding vectors surrounding the one or more erroneous vectors with respect to time; determine a surrounding vector distance metric based on one or more elements of the one or more surrounding vectors corresponding to the one or more erroneous elements of the one or more erroneous vectors; calculate one or more new values for the one or more erroneous elements such that an erroneous vector distance metric, determined based on at least one or more remaining values of the one or more erroneous vectors, corresponds to the surrounding vector distance metric; replace the deleted values of the one or more erroneous elements with the calculated one or more new values for the one or more erroneous elements of the one or more erroneous vectors. - View Dependent Claims (18, 19, 20, 21)
-
Specification