Strategies for identifying anomalies in time-series data

US 7,716,011 B2
Filed: 02/28/2007
Issued: 05/11/2010
Est. Priority Date: 02/28/2007
Status: Expired due to Fees

First Claim

Patent Images

1. A computerized method for detecting one or more anomalies in time-series data, comprising:

collecting time-series data from an environment to provide collected time-series data, the collected time-series data having a plurality of portions;

dividing the collected time-series data into a plurality of collected data segments;

fitting a plurality of local models to the respective plurality of collected data segments, the plurality local models collectively forming a global model; and

determining whether there is at least one anomaly in the collected time-series data or no anomalies based on a comparison between the collected time-series data and the global model,wherein the fitting selects a type of model-fitting paradigm to be applied to the collected time-series data to generate the plurality of local models on a portion-by-portion basis, wherein the fitting selects the type of model-fitting paradigm based on an error value metric, the error value metric corresponding to a difference between a point in the collected time-series data and a corresponding model point, wherein the fitting selects a first model-fitting paradigm that relies on an absolute value (L1) measure of the error value metric when a portion of the collected time-series data under consideration is considered anomalous, wherein the fitting selects another model-fitting paradigm that relies on a squared-term (L2) measure of the error value metric when the portion under consideration is considered normal.

View all claims

2 Assignments

Timeline View

Assignment View

0 Petitions

Accused Products

Abstract

A strategy is described for identifying anomalies in time-series data. The strategy involves dividing the time-series data into a plurality of collected data segments and then using a modeling technique to fit local models to the collected data segments. Large deviations of the time-series data from the local models are indicative of anomalies. In one approach, the modeling technique can use an absolute value (L1) measure of error value for all of the collected data segments. In another approach, the modeling technique can use the L1 measure for only those portions of the time-series data that are projected to be anomalous. The modeling technique can use a squared-term (L2) measure of error value for normal portions of the time-series data. In another approach, the modeling technique can use an iterative expectation-maximization strategy in applying the L1 and L2 measures.

70 Citations

View as Search Results

17 Claims

1. A computerized method for detecting one or more anomalies in time-series data, comprising:
- collecting time-series data from an environment to provide collected time-series data, the collected time-series data having a plurality of portions;
  
  dividing the collected time-series data into a plurality of collected data segments;
  
  fitting a plurality of local models to the respective plurality of collected data segments, the plurality local models collectively forming a global model; and
  
  determining whether there is at least one anomaly in the collected time-series data or no anomalies based on a comparison between the collected time-series data and the global model,wherein the fitting selects a type of model-fitting paradigm to be applied to the collected time-series data to generate the plurality of local models on a portion-by-portion basis, wherein the fitting selects the type of model-fitting paradigm based on an error value metric, the error value metric corresponding to a difference between a point in the collected time-series data and a corresponding model point, wherein the fitting selects a first model-fitting paradigm that relies on an absolute value (L1) measure of the error value metric when a portion of the collected time-series data under consideration is considered anomalous, wherein the fitting selects another model-fitting paradigm that relies on a squared-term (L2) measure of the error value metric when the portion under consideration is considered normal.
- View Dependent Claims (2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12)
- - 2. The computerized method of claim 1, wherein the plurality of portions correspond to a plurality of respective data points.
  - 3. The computerized method of claim 1, wherein the fitting uses an iterative procedure to define whether the portion under consideration is considered anomalous or normal, wherein the iterative procedure provides an opportunity to redefine the portion under consideration from one iteration to a next iteration.
  - 4. The computerized method of claim 3, wherein the iterative procedure is an expectation-maximization procedure,wherein an expectation stage of the expectation-maximization procedure involves labeling the portion under consideration as anomalous or normal, andwherein a maximization stage of the expectation-maximization procedure involves generating parameters used to provide a local model for the portion under consideration, the maximization stage relying on the labeling performed by the expectation stage.
  - 5. The computerized method of claim 1, wherein an abnormal portion is assumed to fall around a local model with a first probability distribution and a normal portion is assumed to fall around a local model with a second probability distribution, wherein the portion under consideration is labeled as abnormal or normal depending on the probability distribution deemed most likely to apply to the portion under consideration.
  - 6. The computerized method of claim 5, wherein the first probability distribution is a Laplacian distribution and the second probability distribution is a Gaussian distribution.
  - 7. The computerized method of claim 1, wherein an abnormal portion is assumed to fall around a local model with a first probability distribution and a normal portion is assumed to fall around a local model with a second probability distribution, wherein the portion under consideration is labeled as abnormal or normal depending on the probability distribution deemed most likely to apply to the portion under consideration.
  - 8. The computerized method of claim 1, wherein the collected time-series data reflects transactions that occur within the environment.
  - 9. The computerized method of claim 1, wherein the collected time-series data reflects traffic within a network environment, and wherein said at least one anomaly is associated with either a dramatic increase or decrease in the traffic.
  - 10. The computerized method of claim 1, wherein the collected time-series data comprises a plurality of instances of collected time-series data that have been collected from a plurality of respective collection points in the environment, and wherein the dividing, fitting, and determining are performed on the plurality of instances of collected time-series data in a single batch operation.
  - 11. One or more machine-readable storage media containing machine-readable instructions for implementing the computerized method of claim 1.
  - 12. One or more computing devices, comprising:
    - one or more processors; and
      
      memory to store computer-executable instructions that, when executed by the one or more processors, perform the computerized method of claim 1.

13. A computerized method for detecting one or more anomalies in time-series data, comprising:
- collecting time-series data from an environment to provide collected time-series data, the collected time-series data having a plurality of portions;
  
  dividing the collected time-series data into a plurality of collected data segments;
  
  labeling portions of the collected time-series data as either anomalous or normal;
  
  fitting a plurality of local models to the respective plurality of collected data segments, the plurality of local models collectively forming a global model, wherein the fitting uses a first model-fitting paradigm for any portion of the time-series data that is considered anomalous and a second model-fitting paradigm for any portion of the time-series data that is considered normal, wherein the first model-fitting paradigm involves using an absolute value (L1) measure to represent an error value metric, and wherein the second model-fitting paradigm involves using a squared-term (L2) measure to represent the error value metric, the error value metric corresponding to a difference between a point in the collected time-series data and a corresponding model point; and
  
  determining whether there is at least one anomaly in the collected time-series data or no anomalies based on a comparison between the collected time-series data and the global model.
- View Dependent Claims (14, 15, 16)
- - 14. The computerized method of claim 13, further comprising repeating the labeling and fitting a plurality of time-series data using an expectation-maximization procedure.
  - 15. One or more machine-readable storage media containing machine-readable instructions for implementing the computerized method of claim 13.
  - 16. One or more computing devices, comprising:
    - one or more processors; and
      
      memory to store computer-executable instructions that, when executed by the one or more processors, perform the computerized method of claim 13.

17. An analysis system for detecting one or more anomalies in time-series data, comprising:
- a data receiving module configured to collect time-series data from an environment to provided collected time-series data;
  
  an anomaly analysis module configured to;
  
  divide the collected time-series data into a plurality of collected data segments, the collected time-series data having a plurality of portions;
  
  fit a plurality of local models to the respective plurality of collected data segments using a plurality of different model-fitting paradigms, the plurality of local models collectively forming a global model; and
  
  identify at least one anomaly in the collected time-series data based on a comparison between the collected time-series data and the global model, to thereby provide an output result,wherein the fitting selects the plurality of different model-fitting paradigms to achieve a desired combination of accuracy and computational processing speed, wherein the fitting uses a first model-fitting paradigm for any portion of the time-series data that is considered anomalous and a second model-fitting paradigm for any portion of the time-series data that is considered normal, wherein the first model-fitting paradigm involves using an absolute value (L1) measure to represent an error value metric, and wherein the second model-fitting paradigm involves using a squared-term (L2) measure to represent the error value metric, the error value metric corresponding to a difference between a point in the collected time-series data and a corresponding model point; and
  
  an output module configured to provide the output result.

Specification

Resources

Litigation Campaign Assessment

Current Assignee
Microsoft Technology Licensing LLC (Microsoft Corporation)
Original Assignee
Microsoft Corporation
Inventors
Thibaux, Romain J., Platt, John C., Maltz, David A., Kiciman, Emre M.
Primary Examiner(s)
Feliciano; Eliseo Ramos
Assistant Examiner(s)
Ngon; Ricky

Application Number

US11/680,590
Publication Number

US 20080208526A1
Time in Patent Office

1,168 Days
Field of Search

702179-181, 702/185, 705/35
US Class Current

702/179
CPC Class Codes

G06F 2218/12 Classification; Matching

H04L 63/1425 Traffic logging, e.g. anoma...

Strategies for identifying anomalies in time-series data

First Claim

2 Assignments

0 Petitions

Accused Products

Abstract

70 Citations

17 Claims

Specification

Solutions

Use Cases

Quick Links

Strategies for identifying anomalies in time-series data

First Claim

2 Assignments

Subscription Required

Subscription Required

0 Petitions

Subscription Required

Accused Products

Subscription Required

Abstract

70 Citations

17 Claims

Specification

Subscription Required

Solutions

Use Cases

Quick Links