Bayesian approach for learning regression decision graph models and regression models for time series analysis
First Claim
1. A computer-implemented data analysis method that makes predictions relative to time series data, the predictions related to nontrivial extractions of implicit, previously unknown information obtained by data mining within large amounts of data, the method comprising:
- storing, in a memory communicatively coupled to a processor, computer-executable instructions for performing the method of making predictions relative to time series data, the predictions related to nontrivial extractions of implicit, previously unknown information obtained by data mining within large amounts of data;
executing the instructions on the processor, wherein the instructions result in data mining within the large amounts of data;
according to the instructions being executed;
employing a Bayesian model selection approach to construct a decision graph based on the data relating to observations of time series data, the decision graph having a model structure that includes at least two leaves, at least one leaf of the decision graph including at least one nontrivial linear regression wherein the Bayesian model selection approach comprising a greedy search algorithm to grow the model by adding leaves to a model so long as the model improves and performing a merge operation to the leaves after the model has more than two leaves;
providing a set of potential regressors having variables associated with the data, wherein the potential regressors are ordered in a descending order according to their correlation relative to a target variable to be predicted, the greedy search algorithm being performed iteratively relative to respective leaves of the model for a subset of potential regressors and, wherein the non-trivial linear regression at the at least one leaf corresponding to at least one variable of the set of potential regressors and the merge operator operates on at least two leaves so that at least one non-root node of the decision graph has more than one parent node;
computing a Bayesian score for a split leaf model and a merge model;
storing the model with the higher score computed Bayesian score;
repeating the performance of the greedy search and computation of the Bayesian score so long as the model score improves;
terminating the iterative process if a regressor does not improve the model score;
employing the decision graph to predict future observations in the time series data;
employing a split leaf operator at one leaf of the decision graph to grow the decision graph to include additional leaves, each of the additional leaves including at least one linear regression on at least one variable of the set of potential regressors;
storing or displaying the predicted future observations of the time series data, wherein the predicted future observations are based on the decision graph and include implicit, previously unknown information obtained from mining the data.
2 Assignments
0 Petitions
Accused Products
Abstract
Methods and systems are disclosed for learning a regression decision graph model using a Bayesian model selection approach. In a disclosed aspect, the model structure and/or model parameters can be learned using a greedy search algorithm applied to grow the model so long as the model improves. This approach enables construction of a decision graph having a model structure that includes a plurality of leaves, at least one of which includes a non-trivial linear regression. The resulting model thus can be employed for forecasting, such as for time series data, which can include single or multi-step forecasting.
133 Citations
44 Claims
-
1. A computer-implemented data analysis method that makes predictions relative to time series data, the predictions related to nontrivial extractions of implicit, previously unknown information obtained by data mining within large amounts of data, the method comprising:
-
storing, in a memory communicatively coupled to a processor, computer-executable instructions for performing the method of making predictions relative to time series data, the predictions related to nontrivial extractions of implicit, previously unknown information obtained by data mining within large amounts of data; executing the instructions on the processor, wherein the instructions result in data mining within the large amounts of data; according to the instructions being executed; employing a Bayesian model selection approach to construct a decision graph based on the data relating to observations of time series data, the decision graph having a model structure that includes at least two leaves, at least one leaf of the decision graph including at least one nontrivial linear regression wherein the Bayesian model selection approach comprising a greedy search algorithm to grow the model by adding leaves to a model so long as the model improves and performing a merge operation to the leaves after the model has more than two leaves; providing a set of potential regressors having variables associated with the data, wherein the potential regressors are ordered in a descending order according to their correlation relative to a target variable to be predicted, the greedy search algorithm being performed iteratively relative to respective leaves of the model for a subset of potential regressors and, wherein the non-trivial linear regression at the at least one leaf corresponding to at least one variable of the set of potential regressors and the merge operator operates on at least two leaves so that at least one non-root node of the decision graph has more than one parent node; computing a Bayesian score for a split leaf model and a merge model; storing the model with the higher score computed Bayesian score; repeating the performance of the greedy search and computation of the Bayesian score so long as the model score improves; terminating the iterative process if a regressor does not improve the model score; employing the decision graph to predict future observations in the time series data; employing a split leaf operator at one leaf of the decision graph to grow the decision graph to include additional leaves, each of the additional leaves including at least one linear regression on at least one variable of the set of potential regressors; storing or displaying the predicted future observations of the time series data, wherein the predicted future observations are based on the decision graph and include implicit, previously unknown information obtained from mining the data. - View Dependent Claims (2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18)
-
-
19. A system that facilitates forecasting of time series data, comprising:
-
a computer processor that stores, in a memory communicatively coupled to the computer processor, computer-executable instructions, the execution of which by the processor makes predictions relative to time series data and performs data mining of large amounts of data, the predictions related to nontrivial extractions of implicit, previously unknown information obtained by the data mining from the large amounts of data; a model generator that generates a decision graph employed for predicting future observations of time series data, the decision graph is structured as a regression model via a greedy search algorithm comprising; at least one non-leaf node associated with a Boolean function for one of a plurality of previous variables for the time series; a plurality of leaves, each of the plurality of leaves being associated with at least one functional formula corresponding to a non-trivial linear regression for previous observations in the time series data; respective edges associating respective the functional formulas of the plurality of leaves with parent nodes according to the Boolean functions along a path that includes each of the at least one non-leaf node that is a parent relative to the respective plurality of leaves; at least one non-root node of the decision graph having more than one parent node; a forecaster that employs the regression model obtained from the model generator to forecast one or more future observations of the time series data wherein the forecaster iterates through a set of potential regressors ordered in a descending manner of their correlation for a predicted variable associated with the one or more future observations, therefore terminating the greedy search algorithm when addition of a given regressor does not improve model score and performing a merge operation to the leaves after the model has more than two leaves and, wherein the non-trivial linear regression at the at least one leaf corresponding to at least one variable of the set of potential regressors and the merge operator operates on at least two leaves so that at least one non-root node of the decision graph has more than one parent node; and a display comprising output of the one or more future observations of the time series data, including predictions relative to time series data, the predictions related to nontrivial extractions of implicit, previously unknown information obtained by the data mining within the large amounts of data. - View Dependent Claims (20, 21, 22)
-
-
23. A computer-implemented method of forecasting future observations in a sequence of time series data, comprising:
-
storing, in a memory communicatively coupled to a processor, computer-executable instructions for performing the method of making predictions relative to the time series data, the predictions related to nontrivial extractions of implicit, previously unknown information obtained by data mining within large amounts of data; executing the instructions on the processor, wherein the instructions result in data mining within the large amounts of data; according to the instructions being executed; performing a greedy search to grow a model corresponding to the set of time series data represented by a decision graph having at least one non-trivial linear regression at leaves of the decision graph, the greedy search algorithm being performed iteratively relative to respective leaves of the model for a subset of potential regressors, wherein the potential regressors are arranged in the subset in a descending order of their correlation to a predicted variable; computing a Bayesian score for the model and, wherein the performance of the greedy search further comprising splitting a leaf node of the model into a pair of additional leaves, each of the additional leaves including at least one linear regression on at least one variable of a set of potential regressors and the performance of the greedy search further comprising merging at least two leaf nodes of the decision graph provided that the merging improves the Bayesian score for the model and the decision graph is a regression decision graph model comprising a plurality of leaves and at least one non-leaf node, the at least one non-leaf node being associated with a Boolean function for one of a plurality of variables having a split value, at least one non-root node of the regression decision graph model having more than one parent in the regression decision graph model, at least two leaves of the regression decision graph model being associated with at least one linear regression on at least one of the variables so as to provide a piecewise linear regression decision graph model; repeating the performance of the greedy search and computation of the Bayesian score so long as the model score improves; if the model score does not improve, terminating the modification and computation, and providing a model having a model structure corresponding to a decision graph having a fixed number of leaves that include at least one non-trivial linear regression; modifying a regressor variable at one of the leaves of the decision graph to provide a submodel; computing a Bayesian score for the submodel; repeating the modifying and computation of the Bayesian score for the submodel so long as the score of the submodel improves; if the score of the submodel does not improve relative to a previous score of the submodel, providing the submodel with the highest model score as the regression model that best models future observations of the time series data; and employing the best regression model to generate the future observations of the time series data. - View Dependent Claims (24, 25, 26, 27, 28, 29, 30, 31, 32, 33)
-
-
34. A method of predicting future observations of time series data, comprising:
-
storing, in a memory communicatively coupled to a processor, computer-executable instructions for performing the method of making predictions relative to the time series data, the predictions related to nontrivial extractions of implicit, previously unknown information obtained by data mining within large amounts of data; executing the instructions on the processor, wherein the instructions result in data mining within the large amounts of data; according to the instructions being executed; employing a Bayesian model selection approach to construct a decision graph having a model structure that includes at least two leaves, at least one leaf of the decision graph including at least one non-trivial linear regression; the Bayesian model selection approach comprising a greedy search algorithm to grow the model by adding leaves to the model so long as the model improves; providing a set of potential regressors having variables associated with the data, wherein the potential regressors are ordered according to their correlation ranging from a most useful regressor to a least useful regressor relative to a target variable to be predicted; employing the decision graph to predict one or more future observations in the time series data and performing a merge operation to the leaves after the model has more than two leaves and, wherein the non-trivial linear regression at the at least one leaf corresponding to at least one variable of the set of potential regressors and the merge operator operates on at least two leaves so that at least one non-root node of the decision graph has more than one parent node; and storing or displaying the one or more future observations.
-
-
35. A method of forecasting relative to time series data, the instructions comprising:
-
storing, in a memory communicatively coupled to a processor, computer-executable instructions for performing the method of making predictions relative to the time series data, the predictions related to nontrivial extractions of implicit, previously unknown information obtained by data mining within large amounts of data; executing the instructions on the processor, wherein the instructions result in data mining within the large amounts of data; according to the instructions being executed; performing a greedy search to grow a model corresponding to the set of time series data represented by a decision graph having at least one non-trivial linear regression at leaves of the decision graph, the greedy search algorithm being performed iteratively relative to respective leaves of the model for a subset of potential regressors, wherein the potential regressors are arranged in the subset in a descending order of their correlation to a predicted variable and performing a merge operation to the leaves after the model has more than two leaves and, wherein the non-trivial linear regression at the at least one leaf corresponding to at least one variable of the subset of potential regressors and the merge operator operates on at east two leaves so that at least one non-root node of the decision graph has more than one parent node; computing a Bayesian score for the model; repeating the performance of the greedy search and computation of the Bayesian score so long as the model score improves; if the model score does not improve, providing a model having a model structure corresponding to a decision graph having a fixed number of leaves that include at least one nontrivial linear regression; and modifying a regressor variable at one of the leaves of the decision graph to provide a submodel; computing a Bayesian score for the submodel; repeating the modifying so long as the score of the submodel improves;
if the score of the submodel does not improve relative to a previous model score, providing the regression decision graph model with the highest model score as a most likely model that provides a most accurate future observation for the time series data; andstoring or displaying the future observation of the time series data.
-
-
36. A computer implemented system for predicting future observations of time series data having a set of associated variables, comprising:
-
means for making predictions relative to the time series data, the predictions related to nontrivial extractions of implicit, previously unknown information obtained by data mining within large amounts of data; means for data mining within the large amounts of data; means for learning a model structure via a greedy Bayesian model selection approach for data corresponding to time series data for which a set of potential regressor variables have been created, the model structure comprising a decision graph that includes a plurality of leaves, at least one of the plurality of leaves including at least one linear regression on at least one continuous variable of the set of associated variables, wherein the potential regressors associated with the at least one linear regression are arranged in the set based on a descending order of their correlation for the at least one continuous variable; means for learning model parameters at the leaves of the model structure by adjusting the at least one variable on which the at least one linear regression is implemented for the at least one of the plurality of leaves, wherein the means for learning model parameters further comprising means for one of either adding or removing a potential regressor of the set of associated variables relative to a given leaf to provide a submodel and the means for learning a model structure further comprising means for merging at least two leaves based on regressors contained in the least two leaves so that at least one non-root node of the decision graph has more than one parent node; means for scoring the model in order to select a most suitable model to facilitate prediction;
means for generating one or more future observations within the time series data by employing a highest scoring model as the most suitable model to predict the future observations;means for storing the one or more predicted future observations obtained during the data mining; and means for splitting a leaf into a non-leaf node associated with one of the variables and a pair of leaves, wherein the means for splitting being applied iteratively to respective leaves of the model to grow the model so long as the means for scoring provides an improved model score. - View Dependent Claims (37, 38, 39)
-
-
40. A computer-implemented method to forecast future observations for time series data having a plurality of possible regressor variables associated with observations of the time series data, the time series data derived from data mining, the method comprising:
-
storing, in a memory communicatively coupled to a processor, computer-executable instructions for performing the method of making predictions relative to time series data, the predictions related to nontrivial extractions of implicit, previously unknown information obtained by data mining within large amounts of data; executing the instructions on the processor, wherein the instructions result in data mining within the large amounts of data; according to the instructions being executed; progressively modifying a given regressor variable of the model to form a submodel, the model having at least one linear regression on at least one of the plurality of possible regressor variables and the model constructed employing greedy search, wherein the plurality of possible regressor variables are arranged in a descending order of their correlation relative to a target variable; computing a score for the submodel; repeating the progressively modifying with another regressor variable of the plurality of possible regressor variables so long as the computed score improves; improving the model by applying a highest scoring submodel as the regression model and performing a merge operation to leaves after the model has more than two leaves and, wherein the regression model comprises a regression decision graph model comprising a plurality of leaves and at least one non-leaf node, the at least one non-leaf node being associated with a Boolean function for one of a plurality of variables having a split value and at least one node of the decision graph having more than one parent node; forecasting one or more future observations of the time series data by employing the regression model; and storing or displaying the one or more future observations. - View Dependent Claims (41, 42, 43, 44)
-
Specification