Anomaly detection in dynamically evolving data and systems
First Claim
Patent Images
1. A method, comprising:
- a) obtaining a data set of network traffic comprising N-dimensional data points from a traffic analyzer, wherein N>
3 and wherein the traffic analyzer is configured to generate a statistics matrix comprising the N-dimensional data points;
b) by a computer,i. processing the statistics matrix into a Markov kernel matrix,ii. processing the Markov kernel matrix to obtain processed data points with a dimension r lower than N, wherein the processing includes finding r discriminating eigenvectors by providing i=1, . . . , r eigenvalues and respective associated eigenvectors, generating for each i=2, . . . , r a respective ith cluster based on the ith eigenvector and generating other respective clusters based on eigenvectors 1, . . . , i−
1, i+1, . . . , r, computing a distance between each respective ith cluster based on the ith eigenvector and each of the other respective clusters based on eigenvectors 1, . . . , i−
1, i−
1, . . . , r, and finding r eigenvalues and associated respective eigenvectors that provide the highest distance, the associated respective eigenvectors that provide the highest distance being the discriminating eigenvectors,wherein the r discriminating eigenvectors thus found form an embedded space in which processed data points with a reduced dimension r form a normal cluster,iii. detecting an abnormal data point in the processed data points with a dimension r lower than N without relying on a signature of a threat and without use of a threshold, andiv. blocking the abnormal data point.
5 Assignments
0 Petitions
Accused Products
Abstract
Detection of abnormalities in multi-dimensional data is performed by processing the multi-dimensional data to obtain a reduced dimension embedding matrix, using the reduced dimension embedding matrix to form a lower dimension (of at least 2D) embedded space, applying an out-of-sample extension procedure in the embedded space to compute coordinates of a newly arrived data point and using the computed coordinates of the newly arrived data point and Euclidean distances to determine whether the newly arrived data point is normal or abnormal.
-
Citations
18 Claims
-
1. A method, comprising:
-
a) obtaining a data set of network traffic comprising N-dimensional data points from a traffic analyzer, wherein N>
3 and wherein the traffic analyzer is configured to generate a statistics matrix comprising the N-dimensional data points;b) by a computer, i. processing the statistics matrix into a Markov kernel matrix, ii. processing the Markov kernel matrix to obtain processed data points with a dimension r lower than N, wherein the processing includes finding r discriminating eigenvectors by providing i=1, . . . , r eigenvalues and respective associated eigenvectors, generating for each i=2, . . . , r a respective ith cluster based on the ith eigenvector and generating other respective clusters based on eigenvectors 1, . . . , i−
1, i+1, . . . , r, computing a distance between each respective ith cluster based on the ith eigenvector and each of the other respective clusters based on eigenvectors 1, . . . , i−
1, i−
1, . . . , r, and finding r eigenvalues and associated respective eigenvectors that provide the highest distance, the associated respective eigenvectors that provide the highest distance being the discriminating eigenvectors,wherein the r discriminating eigenvectors thus found form an embedded space in which processed data points with a reduced dimension r form a normal cluster, iii. detecting an abnormal data point in the processed data points with a dimension r lower than N without relying on a signature of a threat and without use of a threshold, and iv. blocking the abnormal data point. - View Dependent Claims (2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16)
-
-
17. A system, comprising:
-
a) a traffic analyzer for providing a data set of network traffic comprising N-dimensional data points, wherein N>
3 and wherein the traffic analyzer is configured to generate a statistics matrix comprising the N-dimensional data points; andb) a computer program stored on a non-transitory computer readable medium, the computer program dedicated to; i. process the statistics matrix into a Markov kernel matrix, ii. process the Markov kernel matrix to obtain processed data points with a reduced dimension r lower than N by forming an embedded space defined by r discriminating eigenvectors, wherein r≦
N and wherein the discriminating eigenvectors are selected by providing i=1, . . . , r eigenvalues and respective associated eigenvectors, and, for each i=2, . . . , r, by generating a respective ith cluster based on the ith eigenvector and other clusters based on eigenvectors 1, . . . , i−
1, i+1, . . . , M, by computing a distance between each ith cluster and each of the other clusters, and by finding the r eigenvalues and their respective eigenvectors that provide the highest distance to define a normal cluster of processed data points with reduced dimension M,iii detect an abnormal data point in the processed data points with reduced dimension r without relying on a signature of a threat and without use of a threshold, and iv. block the abnormal data point. - View Dependent Claims (18)
-
Specification