System and method for outlier detection via estimating clusters
First Claim
1. A method of detecting anomalies in a behavior of a system implemented by a processor coupled to a memory, the memory having stored therein a set of instructions, that when executed by the processor, cause the processor to perform the method comprising:
- (a) providing cluster modeling data for a plurality of clusters to an outlier detection module, the cluster modeling data identifying a number of training points in each of the plurality of clusters;
(b) receiving a query point at the outlier detection module, the query point comprising a plurality of parameters, the query point including training data provided by the plurality of sensors in real-time or near real-time, wherein the sensors provide sensor data, wherein the sensor data including at least one of pressure data, flow data, position data, acceleration data, velocity data and temperature data, wherein the sensor data utilized to form the query point;
(c) generating a group of closest clusters that is closest to the query point from the plurality of clusters, using the outlier detection module and determining if the group of the closest cluster is satisfied a threshold value, wherein the threshold value is a user-defined value;
(d) determining a weighted distance value between the query point and each cluster in the group of closest clusters using the outlier detection module, wherein the weighted distance value, WDV, between the query point and each cluster in the group of closest clusters is determined by;
WDV=nd where n is the number of the training points in a cluster and d is the distance between the cluster and the query point;
(e) generating a summary distance value for the query point by combining the weighted distance values between the query point and each of the clusters using the outlier detection module; and
(f) determining if the query point is an outlier based upon the summary distance value using the outlier detection module.
1 Assignment
0 Petitions
Accused Products
Abstract
An efficient method and system for real-time or offline analysis of multivariate sensor data for use in anomaly detection, fault detection, and system health monitoring is provided. Models automatically derived from training data, typically nominal system data acquired from sensors in normally operating conditions or from detailed simulations, are used to identify unusual, out of family data samples (outliers) that indicate possible system failure or degradation. Outliers are determined through analyzing a degree of deviation of current system behavior from the models formed from the nominal system data. The deviation of current system behavior is presented as an easy to interpret numerical score along with a measure of the relative contribution of each system parameter to any off-nominal deviation. The techniques described herein may also be used to “clean” the training data.
29 Citations
18 Claims
-
1. A method of detecting anomalies in a behavior of a system implemented by a processor coupled to a memory, the memory having stored therein a set of instructions, that when executed by the processor, cause the processor to perform the method comprising:
-
(a) providing cluster modeling data for a plurality of clusters to an outlier detection module, the cluster modeling data identifying a number of training points in each of the plurality of clusters; (b) receiving a query point at the outlier detection module, the query point comprising a plurality of parameters, the query point including training data provided by the plurality of sensors in real-time or near real-time, wherein the sensors provide sensor data, wherein the sensor data including at least one of pressure data, flow data, position data, acceleration data, velocity data and temperature data, wherein the sensor data utilized to form the query point; (c) generating a group of closest clusters that is closest to the query point from the plurality of clusters, using the outlier detection module and determining if the group of the closest cluster is satisfied a threshold value, wherein the threshold value is a user-defined value; (d) determining a weighted distance value between the query point and each cluster in the group of closest clusters using the outlier detection module, wherein the weighted distance value, WDV, between the query point and each cluster in the group of closest clusters is determined by;
WDV=ndwhere n is the number of the training points in a cluster and d is the distance between the cluster and the query point; (e) generating a summary distance value for the query point by combining the weighted distance values between the query point and each of the clusters using the outlier detection module; and (f) determining if the query point is an outlier based upon the summary distance value using the outlier detection module. - View Dependent Claims (2, 3, 4, 5, 6, 7, 8, 9)
-
-
10. A system for detecting an anomaly in a behavior of a system, comprising:
-
a processor; a memory coupled to the processor; a plurality of input and output devices coupled to the processor, the plurality of devices including a plurality of sensors, and a data storage coupled to the processor having cluster modeling data for a plurality of clusters stored therein, the cluster modeling data comprising a number of training points in each of the plurality of clusters; the memory having stored therein a set of instructions, that when executed by the processor, cause the processor to perform the operations of; receive a query point comprising a plurality of parameters including training data provided by the plurality of sensors in real-time or near real-time, wherein the sensors provide sensor data, wherein the sensor data including at least one of pressure data, flow data, position data, acceleration data, velocity data and temperature data, wherein the sensor data utilized to form the query point; generate a group of closest clusters that is closest to the query point from the plurality of clusters, determine if the group of the closest cluster is satisfied a threshold value;
wherein the threshold value is a user-defined value;determine a weighted distance value between the query point and each of the clusters in the group of closest clusters, wherein the weighted distance value, WDV, between the query point and each cluster in the group of closest clusters is determined by;
WDV=ndwhere n is the number of the training points in a cluster and d is the distance between the cluster and the query point, generating a summary distance value for the query point by combining the weighted distance values between the query point and each of the clusters using the outlier detection module, and determine if the query point is an outlier based upon the summary distance value. - View Dependent Claims (11, 12, 13, 14, 15, 16, 17, 18)
-
Specification