Anomaly detection for non-stationary data

US 10,445,644 B2
Filed: 12/31/2014
Issued: 10/15/2019
Est. Priority Date: 12/31/2014
Status: Active Grant

First Claim

Patent Images

1. A method comprising:

incorporating one or more anomaly detection applications into a computing system, the one or more anomaly detection applications configuring one or more computer processors of the computing system to perform operations for generating a user interface for representing a health of a process executing within the computing system, the operations comprising;

extracting a training tune series corresponding to the process from an initial time series corresponding to the process, the training time series including a subset of the initial time series, the subset of the initial time series having a length offset by an index prior to a last data point of the initial time series;

modifying outlier data points in the training time series based on predetermined acceptability criteria;

training a plurality of prediction methods using the training time series;

receiving an actual data point corresponding to the initial tune series, the actual data point having an index after the last data point of the training time series;

using the plurality of prediction methods to determine a set of predicted data points corresponding to the actual data point of the initial time series;

determining whether the actual data point is anomalous based on a calculation of whether each of the set of predicted data points is statistically different from the actual data point;

receiving an additional actual data point corresponding to the initial time series and extracting an additional training time series having the length offset by an additional index prior to a last data point of the initial time series, the additional index reflecting a relative position of the actual data point to the additional actual data point; and

performing the generating of the user interface, the generating including providing a visual representation of the initial time series, the visual representation including a visual identification of the determining of whether the actual data point is anomalous and a visual indication of a determining of whether the additional actual data point is anomalous.

View all claims

1 Assignment

Timeline View

Assignment View

0 Petitions

Accused Products

Abstract

A method of detecting anomalies in a time series is disclosed. A training time series corresponding to a process is extracted from an initial time series corresponding to the process, the training time series including a subset of the initial time series. Outlier data points in the training time series are modified based on predetermined acceptability criteria. A plurality of prediction methods are trained using the training time series. An actual data point corresponding to the initial time series is received. The plurality of prediction methods are used to determine a set of predicted data points corresponding to the actual data point. It is determined whether the actual data point is anomalous based on a calculation of whether each of the set of predicted data points is statistically different from the actual data point.

Citations

19 Claims

1. A method comprising:
- incorporating one or more anomaly detection applications into a computing system, the one or more anomaly detection applications configuring one or more computer processors of the computing system to perform operations for generating a user interface for representing a health of a process executing within the computing system, the operations comprising;
  
  extracting a training tune series corresponding to the process from an initial time series corresponding to the process, the training time series including a subset of the initial time series, the subset of the initial time series having a length offset by an index prior to a last data point of the initial time series;
  
  modifying outlier data points in the training time series based on predetermined acceptability criteria;
  
  training a plurality of prediction methods using the training time series;
  
  receiving an actual data point corresponding to the initial tune series, the actual data point having an index after the last data point of the training time series;
  
  using the plurality of prediction methods to determine a set of predicted data points corresponding to the actual data point of the initial time series;
  
  determining whether the actual data point is anomalous based on a calculation of whether each of the set of predicted data points is statistically different from the actual data point;
  
  receiving an additional actual data point corresponding to the initial time series and extracting an additional training time series having the length offset by an additional index prior to a last data point of the initial time series, the additional index reflecting a relative position of the actual data point to the additional actual data point; and
  
  performing the generating of the user interface, the generating including providing a visual representation of the initial time series, the visual representation including a visual identification of the determining of whether the actual data point is anomalous and a visual indication of a determining of whether the additional actual data point is anomalous.
- View Dependent Claims (2, 3, 4, 5, 6, 7, 8)
- - 2. The method of claim 1, wherein the calculation of whether each of the set of predicted data points is statistically different from the actual data point includes a determination that the Mahalanobis distance between the prediction error and the fitted multivariate normal joint probability distribution of each of the set of predicted data points is within a specified range.
  - 3. The method of claim 1, further comprising selecting the combination of each of the plurality of prediction methods to minimize a number of false anomaly detections.
  - 4. The method of claim 1, wherein the representing of the determination of whether the actual data point is anomalous including providing a visual indication of a strength of the determination.
  - 5. The method of claim 4, wherein the strength of the determination is based on a number of the plurality of prediction methods that indicate an anomaly with respect to the data point.
  - 6. The method of claim 4, wherein the strength is represented as a size of the visual indication of the strength of the determination of whether the actual data point is anomalous relative to a size of a visual indication of a strength of a determination of whether the additional actual data point is anomalous.
  - 7. The method of claim 1, wherein the training time series represents a window of the initial time series that is recent in relation to the actual data point.
  - 8. The method of claim 1, wherein the generation of the user interface includes providing a magnification element for magnifying a comparison between the actual data point and at least one of the set of predicted data points.

9. A system comprising:
- one or more computer processors;
  
  one or more computer memories;
  
  one or more modules incorporated into the one or more computer memories, the one or more modules configuring the one or more computer processors to perform operations for generating a user interface for representing a health of a process executing within a computing system, the operations comprising;
  
  extracting a training time series corresponding to a process from an initial time series corresponding to the process, the training time series including a subset of the initial time series, the subset of the initial time series having a length offset by an index prior to a last data point of the initial time series;
  
  modifying outlier data points in the training time series based on predetermined acceptability criteria;
  
  training a plurality of prediction methods using the training time series;
  
  receiving an actual data point corresponding to the initial time series, the actual data point having an index after the last data point of the training time series;
  
  using the plurality of prediction methods to determine a set of predicted data points corresponding to the actual data point of the initial time series;
  
  determining whether the actual data point is anomalous based on a calculation of whether each of the set of predicted data points is statistically different from the actual data point;
  
  receiving an additional actual data point corresponding to the initial time series and extracting an additional training time series having the length offset by an additional index prior to a last data point of the initial time series, the additional index reflecting a relative position of the actual data point to the additional actual data point; and
  
  performing the generating of the user interface, the generating including providing a visual representation of the initial time series, the visual representation including a visual indication of the determining of whether the actual data point is anomalous and a visual indication of a determining of whether the additional actual data point is anomalous.
- View Dependent Claims (10, 11, 12, 13, 14)
- - 10. The system of claim 9, wherein the calculation of whether each of the set of predicted data points is statistically different from the actual data point includes a determination that the Mahalanobis distance between the prediction error and the fitted multivariate normal joint probability distribution of each of the set of predicted data points is within a specified range.
  - 11. The system of claim 9, the operations further comprising selecting the combination of each of the plurality of prediction methods to minimize a number of false anomaly detections.
  - 12. The system of claim 9, wherein the representing of the determination of whether the actual data point is anomalous includes providing a visual indication of a strength of the determination.
  - 13. The system of claim 12, wherein the strength of the determination is based on a number of the plurality of prediction methods that indicate an anomaly with respect to the data point.
  - 14. The system of claim 9, wherein the training time series represents a window of the initial time series that is recent in relation to the actual data point.

15. A non-transitory machine-readable medium comprising a set of instructions that, when executed by one or more processors, causes the one or more processors to perform operations for generating a user interface for representing a health of a process executing within a computing system, the operations comprising:
- extracting a training time series corresponding to a process from an initial time series corresponding to the process, the training time series including a subset of the initial time series, the subset of the initial time series having a length offset by an index prior to a last data point of the initial time series;
  
  modifying outlier data points in the training time series based on predetermined acceptability criteria;
  
  training a plurality of prediction methods using the training time series;
  
  receiving an actual data point corresponding to the initial time series, the actual data point having an index after the last data point of the training time series;
  
  using the plurality of prediction methods to determine a set of predicted data points corresponding to the actual data point of the initial time series;
  
  determining whether the actual data point is anomalous based on a calculation of whether each of the set of predicted data points is statistically different from the actual data point;
  
  receiving an additional actual data point corresponding to the initial time series and extracting an additional training time series having the length offset by an additional index prior to a last data point of the initial time series, the additional index reflecting a relative position of the actual data point to the additional actual data point; and
  
  performing the generating of the user interface, the generating including providing a visual representation of the initial time series, the visual representation including a visual indication of the determining of whether the actual data point is anomalous and a visual indication of a determining of whether the additional actual data point is anomalous.
- View Dependent Claims (16, 17, 18, 19)
- - 16. The non-transitory machine readable medium of claim 15, wherein the calculation of whether each of the set of predicted data points is statistically different from the actual data point includes a determination that the Mahalanobis distance between the prediction error and the fitted multivariate normal joint probability distribution of each of the set of predicted data points is within a specified range.
  - 17. The non-transitory machine readable medium of claim 15, further comprising selecting the combination of each of the plurality of prediction methods to minimize a number of false anomaly detections.
  - 18. The non-transitory machine readable medium of claim 15, wherein the representing of the determination of whether the actual data point is anomalous includes providing a visual indication of a strength of the determination.
  - 19. The non-transitory machine readable medium of claim 18, wherein the strength of the determination is based on a number of the plurality of prediction methods that indicate an anomaly with respect to the data point.

Specification

Resources

Litigation Campaign Assessment

Current Assignee
eBay Inc.
Original Assignee
eBay Inc.
Inventors
Moghtaderi, Azadeh, Bawa, Gagan Singh, Schwarzbach, David
Primary Examiner(s)
Sitiriche, Luis A

Application Number

US14/588,355
Publication Number

US 20160189041A1
Time in Patent Office

1,749 Days
Field of Search

None
US Class Current
CPC Class Codes

G06N 20/00 Machine learning

G06N 5/04 Inference or reasoning models

Anomaly detection for non-stationary data

First Claim

1 Assignment

0 Petitions

Accused Products

Abstract

Citations

19 Claims

Specification

Solutions

Use Cases

Quick Links

Anomaly detection for non-stationary data

First Claim

1 Assignment

Subscription Required

Subscription Required

0 Petitions

Subscription Required

Accused Products

Subscription Required

Abstract

Citations

19 Claims

Specification

Subscription Required

Solutions

Use Cases

Quick Links