Anomaly detection in dynamically evolving data and systems
First Claim
Patent Images
1. A computer implemented method, comprising steps of:
- a) receiving multi-dimensional data with multi-dimensional data points, each data point having n features;
b) choosing a plurality m of data points to form an input matrix of size m×
n;
c) processing input matrix m×
n to obtain a reduced dimension embedding matrix of size m×
r to form an embedded space of dimension r that includes a normal cluster, wherein r<
<
n;
d) applying an out-of-sample extension (OOSE) procedure to a newly arrived multidimensional data point (NAMDP) not belonging to the plurality m of data points to compute coordinates of the NAMDP in the embedded space;
e) generate a histogram of density values of the embedded r dimensional data points and on the computed coordinates of the NAMDP;
f determining, based on the density values whether the NAMDP is normal, belonging to the normal cluster, or abnormal, not belonging to the normal cluster, wherein an abnormal value is mapped to the smallest histogram bin size and normal values consist of the other histogram bin values; and
g if the NAMDP is abnormal, blocking the abnormal data point, whereby the performing of steps (c)-(f) in an embedded space of dimension r wherein r<
<
n significantly reduces computer memory needs and speeds up computing operations for detection of anomalies.
3 Assignments
0 Petitions
Accused Products
Abstract
Detection of abnormalities in multi-dimensional data is performed by processing the multi-dimensional data to obtain a reduced dimension embedding matrix, using the reduced dimension embedding matrix to form a lower dimension (of at least 2D) embedded space, applying an out-of-sample extension procedure in the embedded space to compute coordinates of a newly arrived data point and using the computed coordinates of the newly arrived data point and Euclidean distances to determine whether the newly arrived data point is normal or abnormal.
17 Citations
20 Claims
-
1. A computer implemented method, comprising steps of:
-
a) receiving multi-dimensional data with multi-dimensional data points, each data point having n features; b) choosing a plurality m of data points to form an input matrix of size m×
n;c) processing input matrix m×
n to obtain a reduced dimension embedding matrix of size m×
r to form an embedded space of dimension r that includes a normal cluster, wherein r<
<
n;d) applying an out-of-sample extension (OOSE) procedure to a newly arrived multidimensional data point (NAMDP) not belonging to the plurality m of data points to compute coordinates of the NAMDP in the embedded space; e) generate a histogram of density values of the embedded r dimensional data points and on the computed coordinates of the NAMDP; f determining, based on the density values whether the NAMDP is normal, belonging to the normal cluster, or abnormal, not belonging to the normal cluster, wherein an abnormal value is mapped to the smallest histogram bin size and normal values consist of the other histogram bin values; and g if the NAMDP is abnormal, blocking the abnormal data point, whereby the performing of steps (c)-(f) in an embedded space of dimension r wherein r<
<
n significantly reduces computer memory needs and speeds up computing operations for detection of anomalies. - View Dependent Claims (2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14)
-
-
15. An anomaly detection system (ADS), comprising:
- a computer program stored on a non-transitory computer readable medium, the computer program dedicated to performing steps of;
a) receiving multi-dimensional data with multi-dimensional data points, each data point having n features; b) choosing a plurality m of data points to form an input matrix of size m×
n;c) processing input matrix m×
n to obtain a reduced dimension embedding matrix of size m×
r to form an embedded space of dimension r that includes a normal cluster, wherein r«
n;d) applying an out-of-sample extension (OOSE) procedure to a newly arrived multidimensional data point (NAMDP) not belonging to the plurality m of data points to compute coordinates of the NAMDP in the embedded space; e) generate a histogram of density values of the embedded r dimensional data points and on the computed coordinates of the NAMDP; f) determining, based on the density values whether the NAMDP is normal, belonging to the normal cluster, or abnormal, not belonging to the normal cluster, wherein an abnormal value is mapped to the smallest histogram bin size and normal values consist of the other histogram bin values; and g) if the NAMDP is abnormal, blocking the abnormal data point, whereby the performing of steps (c)-(f) in an embedded space of dimension r wherein r«
n significantly reduces computer memory needs and speeds up computing operations for detection of anomalies. - View Dependent Claims (16, 17, 18, 19, 20)
- a computer program stored on a non-transitory computer readable medium, the computer program dedicated to performing steps of;
Specification