Method and system for detecting anomalies in time series data
First Claim
1. A computer-implemented method for identifying anomalies in time series data, comprising:
- at a computer server having one or more processors and memory for storing programs to be executed by the one or more processors;
storing in a database time series data for a data source, wherein the time series data comprises a plurality of time-value pairs, each pair including a value of one or more attributes associated with the data source and a time associated with the value;
for a particular attribute, generating a plurality of forecasting models for characterizing the time-value pairs in a respective subset of the time series data, each forecasting model including an estimated attribute value and an associated error-variance;
for a respective time-value pair associated with the particular attribute;
determining a plurality of differences between the value of the time-value pair and respective estimated attribute values of the plurality of forecasting models;
tagging the time-value pair as an anomaly if the differences for at least a first subset of the forecasting models are greater than the corresponding error variances; and
determining a significance factor such that the respective differences for at least a second subset of the forecasting models are smaller than the corresponding error-variances multiplied by the significance factor, wherein the first subset is within the second subset; and
in response to a request from a client application for analytics information for the data source, the request including a predefined significance threshold for one or more of the attributes, reporting to the client application at least a subset of the time-value pairs tagged as anomalies for one or more of the attributes if their respective significance factors exceed the predefined significance threshold.
2 Assignments
0 Petitions
Accused Products
Abstract
A server system stores time series data for a data source. The time series data comprises a plurality of time-value pairs, each pair including a value associated with an attribute of the data source and a time. For a particular attribute, the server system generates a plurality of forecasting models for characterizing the time-value pairs, each model including an estimated attribute value and an associated error-variance. For a time-value pair, the server system determines a plurality of differences between the value of the time-value pair and respective estimated attribute values of the plurality of forecasting models and tags the time-value pair as an anomaly if the differences for at least a first subset of the forecasting models are greater than the corresponding error variances. In response to a request from a client application, the server system returns at least a subset of the time-value pairs tagged as anomalies.
63 Citations
29 Claims
-
1. A computer-implemented method for identifying anomalies in time series data, comprising:
at a computer server having one or more processors and memory for storing programs to be executed by the one or more processors; storing in a database time series data for a data source, wherein the time series data comprises a plurality of time-value pairs, each pair including a value of one or more attributes associated with the data source and a time associated with the value; for a particular attribute, generating a plurality of forecasting models for characterizing the time-value pairs in a respective subset of the time series data, each forecasting model including an estimated attribute value and an associated error-variance; for a respective time-value pair associated with the particular attribute; determining a plurality of differences between the value of the time-value pair and respective estimated attribute values of the plurality of forecasting models; tagging the time-value pair as an anomaly if the differences for at least a first subset of the forecasting models are greater than the corresponding error variances; and determining a significance factor such that the respective differences for at least a second subset of the forecasting models are smaller than the corresponding error-variances multiplied by the significance factor, wherein the first subset is within the second subset; and in response to a request from a client application for analytics information for the data source, the request including a predefined significance threshold for one or more of the attributes, reporting to the client application at least a subset of the time-value pairs tagged as anomalies for one or more of the attributes if their respective significance factors exceed the predefined significance threshold. - View Dependent Claims (2, 3, 4, 5, 6, 7)
-
8. A server system for identifying anomalies in time series data, comprising:
-
one or more processors for executing programs; and memory to store data and to store one or more programs to be executed by the one or more processors, the one or more programs including instructions for; storing in a database time series data for a data source, wherein the time series data comprises a plurality of time-value pairs, each pair including a value of one or more attributes associated with the data source and a time associated with the value; for a particular attribute, generating a plurality of forecasting models for characterizing the time-value pairs in a respective subset of the time series data, each forecasting model including an estimated attribute value and an associated error-variance; for a respective time-value pair associated with the particular attribute; determining a plurality of differences between the value of the time-value pair and respective estimated attribute values of the plurality of forecasting models; tagging the time-value pair as an anomaly if the differences for at least a first subset of the forecasting models are greater than the corresponding error variances; and determining a significance factor such that the respective differences for at least a second subset of the forecasting models are smaller than the corresponding error-variances multiplied by the significance factor, wherein the first subset is within the second subset; and in response to a request from a client application for analytics information for the data source, the request including a predefined significance threshold for one or more of the attributes, reporting to the client application at least a subset of the time-value pairs tagged as anomalies for one or more of the attributes if their respective significance factors exceed the predefined significance threshold. - View Dependent Claims (9, 10, 11, 12, 13)
-
-
14. A non-transitory computer readable-storage medium storing one or more programs for execution by one or more processors of a server system for identifying anomalies in time series data, the one or more programs comprising instructions for:
-
storing in a database time series data for a data source, wherein the time series data comprises a plurality of time-value pairs, each pair including a value of one or more attributes associated with the data source and a time associated with the value; for a particular attribute, generating a plurality of forecasting models for characterizing the time-value pairs in a respective subset of the time series data, each forecasting model including an estimated attribute value and an associated error-variance; for a respective time-value pair associated with the particular attribute; determining a plurality of differences between the value of the time-value pair and respective estimated attribute values of the plurality of forecasting models; and tagging the time-value pair as an anomaly if the differences for at least a first subset of the forecasting models are greater than the corresponding error variances; and determining a significance factor such that the respective differences for at least a second subset of the forecasting models are smaller than the corresponding error-variances multiplied by the significance factor, wherein the first subset is within the second subset; and in response to a request from a client application for analytics information for the data source, the request including a predefined significance threshold for one or more of the attributes, reporting to the client application at least a subset of the time-value pairs tagged as anomalies for one or more of the attributes if their respective significance factors exceed the predefined significance threshold. - View Dependent Claims (15, 16, 17, 18, 19)
-
-
20. A computer system for detecting events of interest in time series data, comprising:
-
one or more processors for executing programs; and memory to store data and to store one or more programs to be executed by the one or more processors, the one or more programs including; a time series data collection module configured to collect time series data at one or more predefined time intervals from a plurality of data sources, wherein the time series data comprises a plurality of time-value pairs, each pair including a value of one attribute associated with the data sources and a time during which the value was collected; a time series storage module configured to store in the memory the collected time series data such that a time-value pair is added to the stored time series data for a respective collection of time series data without disturbing the stored time series data for the respective collection; an event detection module configured to determine whether a new time-value pair is an event of interest with reference to its associated collection of time series data, including one or more sub-modules for; generating a plurality of forecasting models for characterizing different subsets of the collection of time series data, each forecasting model including an estimated attribute value and an associated error-variance; determining whether the new time-value pair is within a scope defined by the estimated attribute value and the error-variance for each of the plurality of forecasting models; tagging the new time-value pair as an event of interest if the new time-value pair is outside the respective scopes for at least a first subset of the forecasting models; and determining a significance factor such that the respective differences for at least a second subset of the forecasting models are smaller than the corresponding error-variances multiplied by the significance factor, wherein the first subset is within the second subset; and an event storage module configured to store the tagged time-value pairs and their respective significance factors such that the tagged time-value pairs are ready to be served in response to a request for events of interest from a client application if their respective significance factors exceed a predefined significance threshold in the request. - View Dependent Claims (21, 22, 23)
-
-
24. A computer-implemented method for detecting events of interest in time series data, comprising:
at a computer server having one or more processors and memory for storing programs to be executed by the one or more processors; collecting time series data at one or more predefined time intervals from a plurality of data sources, wherein the time series data comprises a plurality of time-value pairs, each pair including a value of one attribute associated with the data sources and a time during which the value was collected; storing in memory the collected time series data such that a time-value pair is added to the stored time series data for a respective collection of time series data without disturbing the stored time series data for the respective collection; determining whether a new time-value pair is an event of interest with reference to its associated collection of time series data, further including; generating a plurality of forecasting models characterizing different subsets of the associated collection of time series data, each forecasting model including an estimated attribute value and an associated error-variance; determining whether the particular new time-value pair is within the associated error-variance for each of the plurality of forecasting models; and tagging the particular time-value pair as an anomaly when the value of the particular time-value pair is outside the error-variance for at least a first subset of the forecasting models; and determining a significance factor for the particular time-value pair such that the respective differences for at least a second subset of the forecasting models are smaller than the corresponding error-variances multiplied by the significance factor, wherein the first subset is within the second subset; and storing the time-value pairs tagged as anomalies and their respective significance factors such that the stored time-value pairs are ready to be served to a user at a client application in response to a user request for the anomalies if their respective significance factors exceed a predefined significance threshold in the user request. - View Dependent Claims (25, 26, 27)
-
28. A non-transitory computer readable-storage medium storing one or more programs for execution by one or more processors of a server system for identifying anomalies in time series data, the one or more programs comprising instructions for:
-
collecting time series data at one or more predefined time intervals from a plurality of data sources, wherein the time series data comprises a plurality of time-value pairs, each pair including a value of one attribute associated with the data sources and a time when the value was collected; storing in a computer memory the collected time series data such that, when a new time-value pair is collected by the time series data collector, the new time-value pair is added to the stored time series data for a respective collection of time series data without disturbing the previously stored time series data for the respective collection; determining for a particular new time-value pair whether the particular new time-value pair is an anomaly with reference to its associated collection of time series data, including; generating a plurality of forecasting models characterizing different subsets of the associated collection of time series data, each forecasting model including an estimated attribute value and an associated error-variance; determining whether the particular new time-value pair is within the associated error-variance for each of the plurality of forecasting models; and tagging the particular time-value pair as an anomaly when the value of the particular time-value pair is outside the error-variance for at least a first subset of the forecasting models; and determining a significance factor for the particular time-value pair such that the respective differences for at least a second subset of the forecasting models are smaller than the corresponding error-variances multiplied by the significance factor, wherein the first subset is within the second subset; and storing the time-value pairs tagged as anomalies and their respective significance factors such that the stored time-value pairs are ready to be served to a user at a client application in response to a user request for the anomalies if their respective significance factors exceed a predefined significance threshold in the user request. - View Dependent Claims (29)
-
Specification