Systems and methods for clustering time series data based on forecast distributions
First Claim
1. A computer-implemented method, comprising:
- accessing a set of variables, wherein accessing is performed at a computing device;
accessing time series data with respect to each of the variables of the set;
with respect to each of the variables of the set, forecasting values and determining a distribution of the forecasted values, wherein forecasting includes using a forecasting model and the time series data accessed with respect to each of the variables;
identifying all possible variable pairs, wherein each of the variable pairs consists of two of the variables of the set;
with respect to each of the variables pairs, calculating a divergence metric using the distributions of the forecasted values for the variables of the respective variable pair; and
defining clusters such that at least one of the clusters includes two or more of the variables of the set, wherein defining the clusters is performed using a hierarchical clustering algorithm and is based on the calculated divergence metrics, wherein each of the divergence metrics is a symmetric Kullback-Leibler divergence metric, wherein, with respect to each of the variable pairs, the calculated symmetric Kullback-Leibler divergence metric is equal to;
1 Assignment
0 Petitions
Accused Products
Abstract
In accordance with the teachings described herein, systems and methods are provided for clustering time series based on forecast distributions. A method for clustering time series based on forecast distributions may include: receiving time series data relating to one or more aspects of a physical process; applying a forecasting model to the time series data to generate forecasted values and confidence intervals associated with the forecasted values, the confidence intervals being generated based on distribution information relating to the forecasted values; generating a distance matrix that identifies divergence in the forecasted values, the distance matrix being generated based the distribution information relating to the forecasted values; and performing a clustering operation on the plurality of forecasted values based on the distance matrix. The distance matrix may be generated using a symmetric Kullback-Leibler divergence algorithm.
-
Citations
19 Claims
-
1. A computer-implemented method, comprising:
-
accessing a set of variables, wherein accessing is performed at a computing device; accessing time series data with respect to each of the variables of the set; with respect to each of the variables of the set, forecasting values and determining a distribution of the forecasted values, wherein forecasting includes using a forecasting model and the time series data accessed with respect to each of the variables; identifying all possible variable pairs, wherein each of the variable pairs consists of two of the variables of the set; with respect to each of the variables pairs, calculating a divergence metric using the distributions of the forecasted values for the variables of the respective variable pair; and defining clusters such that at least one of the clusters includes two or more of the variables of the set, wherein defining the clusters is performed using a hierarchical clustering algorithm and is based on the calculated divergence metrics, wherein each of the divergence metrics is a symmetric Kullback-Leibler divergence metric, wherein, with respect to each of the variable pairs, the calculated symmetric Kullback-Leibler divergence metric is equal to; - View Dependent Claims (2, 3, 4, 5, 6, 17)
-
-
7. A computer-implemented system, comprising:
-
one or more data processors; and one or more non transitory computer-readable storage media containing instructions operable to cause the one or more processors to perform operations including; accessing a set of variables; accessing time series data with respect to each of the variables of the set; with respect to each of the variables of the set forecasting values and determining a distribution of the forecasted values, wherein forecasting includes using a forecasting model and the time series data accessed with respect to each of the variables; identifying all possible variable pairs, wherein each of the variable pairs consists of two of the variables of the set; with respect to each of the variables pairs, calculating a divergence metric using the distributions of the forecasted values for the variables of the respective variable pair; and defining clusters such that at least one of the clusters includes two or more of the variables of the set, wherein defining the clusters is performed using a hierarchical clustering algorithm and is based on the calculated divergence metrics, wherein each of the divergence metrics is a symmetric Kullback-Leibler divergence metric, wherein, with respect to each of the variable pairs, the calculated symmetric Kullback-Leibler divergence metric is equal to; - View Dependent Claims (8, 9, 10, 11, 18)
-
-
12. A computer-program product tangibly embodied in a non-transitory machine-readable storage medium, the computer-program product including instructions operable to cause a data processing apparatus to perform operations including:
-
accessing a set of variables; accessing time series data with respect to each of the variables of the set; with respect to each of the variables of the set, forecasting values and determining a distribution of the forecasted values, wherein forecasting includes using a forecasting model and the time series data accessed with respect to each of the variables; identifying all possible variable pairs, wherein each of the variable pairs consists of two of the variables of the set; with respect to each of the variables pairs, calculating a divergence metric using the distributions of the forecasted values for the variables of the respective variable pair; and defining clusters such that at least one of the clusters includes two or more of the variables of the set, wherein defining the clusters is performed using a hierarchical clustering algorithm and is based on the calculated divergence metrics, wherein each of the divergence metrics is a symmetric Kullback-Leibler divergence metric, wherein, with respect to each of the variable pairs, the calculated symmetric Kullback-Leibler divergence metric is equal to; - View Dependent Claims (13, 14, 15, 16, 19)
-
Specification