Bayesian approach for learning regression decision graph models and regression models for time series analysis

US 7,660,705 B1
Filed: 03/19/2002
Issued: 02/09/2010
Est. Priority Date: 03/19/2002
Status: Expired due to Fees

First Claim

Patent Images

1. A computer-implemented data analysis method that makes predictions relative to time series data, the predictions related to nontrivial extractions of implicit, previously unknown information obtained by data mining within large amounts of data, the method comprising:

storing, in a memory communicatively coupled to a processor, computer-executable instructions for performing the method of making predictions relative to time series data, the predictions related to nontrivial extractions of implicit, previously unknown information obtained by data mining within large amounts of data;

executing the instructions on the processor, wherein the instructions result in data mining within the large amounts of data;

according to the instructions being executed;

employing a Bayesian model selection approach to construct a decision graph based on the data relating to observations of time series data, the decision graph having a model structure that includes at least two leaves, at least one leaf of the decision graph including at least one nontrivial linear regression wherein the Bayesian model selection approach comprising a greedy search algorithm to grow the model by adding leaves to a model so long as the model improves and performing a merge operation to the leaves after the model has more than two leaves;

providing a set of potential regressors having variables associated with the data, wherein the potential regressors are ordered in a descending order according to their correlation relative to a target variable to be predicted, the greedy search algorithm being performed iteratively relative to respective leaves of the model for a subset of potential regressors and, wherein the non-trivial linear regression at the at least one leaf corresponding to at least one variable of the set of potential regressors and the merge operator operates on at least two leaves so that at least one non-root node of the decision graph has more than one parent node;

computing a Bayesian score for a split leaf model and a merge model;

storing the model with the higher score computed Bayesian score;

repeating the performance of the greedy search and computation of the Bayesian score so long as the model score improves;

terminating the iterative process if a regressor does not improve the model score;

employing the decision graph to predict future observations in the time series data;

employing a split leaf operator at one leaf of the decision graph to grow the decision graph to include additional leaves, each of the additional leaves including at least one linear regression on at least one variable of the set of potential regressors;

storing or displaying the predicted future observations of the time series data, wherein the predicted future observations are based on the decision graph and include implicit, previously unknown information obtained from mining the data.

View all claims

2 Assignments

Timeline View

Assignment View

0 Petitions

Accused Products

Abstract

Methods and systems are disclosed for learning a regression decision graph model using a Bayesian model selection approach. In a disclosed aspect, the model structure and/or model parameters can be learned using a greedy search algorithm applied to grow the model so long as the model improves. This approach enables construction of a decision graph having a model structure that includes a plurality of leaves, at least one of which includes a non-trivial linear regression. The resulting model thus can be employed for forecasting, such as for time series data, which can include single or multi-step forecasting.

133 Citations

View as Search Results

44 Claims

1. A computer-implemented data analysis method that makes predictions relative to time series data, the predictions related to nontrivial extractions of implicit, previously unknown information obtained by data mining within large amounts of data, the method comprising:
- storing, in a memory communicatively coupled to a processor, computer-executable instructions for performing the method of making predictions relative to time series data, the predictions related to nontrivial extractions of implicit, previously unknown information obtained by data mining within large amounts of data;
  
  executing the instructions on the processor, wherein the instructions result in data mining within the large amounts of data;
  
  according to the instructions being executed;
  
  employing a Bayesian model selection approach to construct a decision graph based on the data relating to observations of time series data, the decision graph having a model structure that includes at least two leaves, at least one leaf of the decision graph including at least one nontrivial linear regression wherein the Bayesian model selection approach comprising a greedy search algorithm to grow the model by adding leaves to a model so long as the model improves and performing a merge operation to the leaves after the model has more than two leaves;
  
  providing a set of potential regressors having variables associated with the data, wherein the potential regressors are ordered in a descending order according to their correlation relative to a target variable to be predicted, the greedy search algorithm being performed iteratively relative to respective leaves of the model for a subset of potential regressors and, wherein the non-trivial linear regression at the at least one leaf corresponding to at least one variable of the set of potential regressors and the merge operator operates on at least two leaves so that at least one non-root node of the decision graph has more than one parent node;
  
  computing a Bayesian score for a split leaf model and a merge model;
  
  storing the model with the higher score computed Bayesian score;
  
  repeating the performance of the greedy search and computation of the Bayesian score so long as the model score improves;
  
  terminating the iterative process if a regressor does not improve the model score;
  
  employing the decision graph to predict future observations in the time series data;
  
  employing a split leaf operator at one leaf of the decision graph to grow the decision graph to include additional leaves, each of the additional leaves including at least one linear regression on at least one variable of the set of potential regressors;
  
  storing or displaying the predicted future observations of the time series data, wherein the predicted future observations are based on the decision graph and include implicit, previously unknown information obtained from mining the data.
- View Dependent Claims (2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18)
- - 2. The method of claim 1, the split leaf operator being performed at the leaf for a variable selected to improve the model.
  - 3. The method of claim 1, the greedy search algorithm being performed iteratively relative to respective leaves of the model for a fixed subset of the potential regressors so long as the model improves.
  - 4. The method of claim 1, the greedy search algorithm being performed iteratively relative to respective leaves of the model for a subset of the potential regressors, a next regressor of the set of potential regressors being added to the subset with each subsequent iteration at a respective leaf so long as the model improves.
  - 5. The method of claim 1, the potential regressors are continuous variables.
  - 6. The method of claim 1, the at least one non-trivial linear regressor encoding a functional relationship as probability distributions that represent a structure of the model and model parameters.
  - 7. The method of claim 1, further comprising scoring the model after growing the model via the greedy search algorithm to provide a model score, and comparing the model score of a current model relative to a previous model to discern whether the model is improving.
  - 8. The method of claim 7, the scoring including computing a Bayesian score for each of the previous model and the current model to infer posterior distributions relating to at least one of the data, structures of the current and previous models, and model parameters.
  - 9. The method of claim 8, the Bayesian score of each model being computed as a sum of the scores at the leaves of each respective model.
  - 10. The method of claim 9, the score at a respective leaf being computed as a function of a structure prior and a marginal likelihood of the data that falls to the respective leaf.
  - 11. The method of claim 7, model further comprising applying at least one of a split leaf operator and a merge operator to improve the model represented by the decision graph.
  - 12. The method of claim 1, further comprising performing a parameter search on the model by progressively modifying regressors at respective leaves so long as the model improves.
  - 13. The method of claim 12, the parameter search further comprising at least one of adding and removing a given regressor variable relative to a respective leaf in dependence upon whether the given regressor variable is at the respective leaf.
  - 14. The method of claim 1, the at least one linear regression being of length p, where p corresponds to a fixed number of previous observations selected to provide a p-type Markov autoregressive tree model.
  - 15. The method of claim 1, the decision graph is an autoregressive tree model comprising a plurality of leaves and at least one non-leaf node, the at least one non-leaf node being associated with a Boolean function for one of a plurality of variables having a split leaf value, each of leaves being associated with at least one linear regression of length p so as to provide a piecewise linear autoregressive tree model to facilitate forecasting future observations in the time series data.
  - 16. The method of claim 1, the decision graph is a regression decision graph model comprising a plurality of leaves and at least one non-leaf node, the at least one non-leaf node being associated with a Boolean function for one of a plurality of variables having a split leaf value, each of leaves being associated with at least one linear regression so as to provide a piecewise linear regression decision graph model.
  - 17. The method of claim 1, wherein the decision graph comprises a plurality of leaves and at least one non-leaf node, wherein the at least one non-leaf node is associated with a Boolean function for one of a plurality of variables having a split leaf value, and wherein each of the plurality of leaves is associated with at least one linear regression so as to provide a piecewise linear regression decision graph model.
  - 18. The method of claim 1, the decision graph comprising a plurality of leaves and at least one non-leaf node, the at least one non-leaf node being associated with a Boolean function for one of a plurality of variables having a split leaf value, each of leaves being associated with at least one linear regression to provide a piecewise linear autoregressive tree model, the method further comprising employing the autoregressive tree model to predict future observations in the data.

19. A system that facilitates forecasting of time series data, comprising:
- a computer processor that stores, in a memory communicatively coupled to the computer processor, computer-executable instructions, the execution of which by the processor makes predictions relative to time series data and performs data mining of large amounts of data, the predictions related to nontrivial extractions of implicit, previously unknown information obtained by the data mining from the large amounts of data;
  
  a model generator that generates a decision graph employed for predicting future observations of time series data, the decision graph is structured as a regression model via a greedy search algorithm comprising;
  
  at least one non-leaf node associated with a Boolean function for one of a plurality of previous variables for the time series;
  
  a plurality of leaves, each of the plurality of leaves being associated with at least one functional formula corresponding to a non-trivial linear regression for previous observations in the time series data;
  
  respective edges associating respective the functional formulas of the plurality of leaves with parent nodes according to the Boolean functions along a path that includes each of the at least one non-leaf node that is a parent relative to the respective plurality of leaves;
  
  at least one non-root node of the decision graph having more than one parent node;
  
  a forecaster that employs the regression model obtained from the model generator to forecast one or more future observations of the time series data wherein the forecaster iterates through a set of potential regressors ordered in a descending manner of their correlation for a predicted variable associated with the one or more future observations, therefore terminating the greedy search algorithm when addition of a given regressor does not improve model score and performing a merge operation to the leaves after the model has more than two leaves and, wherein the non-trivial linear regression at the at least one leaf corresponding to at least one variable of the set of potential regressors and the merge operator operates on at least two leaves so that at least one non-root node of the decision graph has more than one parent node; and
  
  a display comprising output of the one or more future observations of the time series data, including predictions relative to time series data, the predictions related to nontrivial extractions of implicit, previously unknown information obtained by the data mining within the large amounts of data.
- View Dependent Claims (20, 21, 22)
- - 20. The system of claim 19, the forecaster being operative to predict future observations relative to a predictor variable as a function of a posterior distribution of at least one leaf of the regression model.
  - 21. The system of claim 20, the forecaster computing the posterior distribution by evaluating the distribution at values for parameters that are most likely given the time series data.
  - 22. The system of claim 20, the forecaster being operative to forecast at a desired future time step by estimating parameter values for a future observation and employing the estimated parameter values in subsequent estimations according to the number of future time steps so as to provide a model having estimated parameter values, the forecaster employing the regression model having estimated parameter values to estimate parameter values at the desired future time step.

23. A computer-implemented method of forecasting future observations in a sequence of time series data, comprising:
- storing, in a memory communicatively coupled to a processor, computer-executable instructions for performing the method of making predictions relative to the time series data, the predictions related to nontrivial extractions of implicit, previously unknown information obtained by data mining within large amounts of data;
  
  executing the instructions on the processor, wherein the instructions result in data mining within the large amounts of data;
  
  according to the instructions being executed;
  
  performing a greedy search to grow a model corresponding to the set of time series data represented by a decision graph having at least one non-trivial linear regression at leaves of the decision graph, the greedy search algorithm being performed iteratively relative to respective leaves of the model for a subset of potential regressors, wherein the potential regressors are arranged in the subset in a descending order of their correlation to a predicted variable;
  
  computing a Bayesian score for the model and, wherein the performance of the greedy search further comprising splitting a leaf node of the model into a pair of additional leaves, each of the additional leaves including at least one linear regression on at least one variable of a set of potential regressors and the performance of the greedy search further comprising merging at least two leaf nodes of the decision graph provided that the merging improves the Bayesian score for the model and the decision graph is a regression decision graph model comprising a plurality of leaves and at least one non-leaf node, the at least one non-leaf node being associated with a Boolean function for one of a plurality of variables having a split value, at least one non-root node of the regression decision graph model having more than one parent in the regression decision graph model, at least two leaves of the regression decision graph model being associated with at least one linear regression on at least one of the variables so as to provide a piecewise linear regression decision graph model;
  
  repeating the performance of the greedy search and computation of the Bayesian score so long as the model score improves;
  
  if the model score does not improve, terminating the modification and computation, and providing a model having a model structure corresponding to a decision graph having a fixed number of leaves that include at least one non-trivial linear regression;
  
  modifying a regressor variable at one of the leaves of the decision graph to provide a submodel;
  
  computing a Bayesian score for the submodel;
  
  repeating the modifying and computation of the Bayesian score for the submodel so long as the score of the submodel improves;
  
  if the score of the submodel does not improve relative to a previous score of the submodel, providing the submodel with the highest model score as the regression model that best models future observations of the time series data; and
  
  employing the best regression model to generate the future observations of the time series data.
- View Dependent Claims (24, 25, 26, 27, 28, 29, 30, 31, 32, 33)
- - 24. The method of claim 23, the splitting of the leaf being performed for a variable having a split value selected to improve the model.
  - 25. The method of claim 23, a next regressor of the set of potential regressors being added to the subset of potential regressors for each subsequent iteration at a respective leaf so long as the model improved relative to a preceding iteration.
  - 26. The method of claim 23, the performance of the greedy search being performed iteratively relative to respective leaves of the model for a fixed subset of potential regressors so long as the model improved relative to a preceding iteration.
  - 27. The method of claim 23, the at least one non-trivial linear regressor encoding a functional relationship as probability distributions that represent a structure of the model and associated model parameters.
  - 28. The method of claim 23, further comprising computing Bayesian scores for a previous model and a current model and inferring posterior distributions relating to at least one of the data, structures of the current and previous models, and model parameters of the current and previous models.
  - 29. The method of claim 23, the computation of the Bayesian scores for the model and submodel being computed as a sum of the Bayesian scores at the respective leaves.
  - 30. The method of claim 29, the score of a respective leaf being computed as a function of a structure prior and a marginal likelihood of the data that falls to the respective leaf.
  - 31. The method of claim 23, the modification further comprising at least one of adding and removing a given one of a plurality of potential regressor variables relative to respective leaves of the decision graph in dependence upon whether the given regressor variable is at the respective leaf.
  - 32. The method of claim 23, further comprising transforming the time series data into a transformed data set having a plurality of data subsets of length p, where p corresponds to a fixed number of consecutive observations of the time series data.
  - 33. The method of claim 23, the decision graph is an autoregressive tree model comprising a plurality of leaves and at least one non-leaf node, the at least one non-leaf node being associated with a Boolean function for one of a plurality of variables having a split value, at least two of the leaves being associated with at least one linear regression on at least one of the variables so as to provide a piecewise linear autoregressive tree model to facilitate forecasting relative to the time series data.

34. A method of predicting future observations of time series data, comprising:
- storing, in a memory communicatively coupled to a processor, computer-executable instructions for performing the method of making predictions relative to the time series data, the predictions related to nontrivial extractions of implicit, previously unknown information obtained by data mining within large amounts of data;
  
  executing the instructions on the processor, wherein the instructions result in data mining within the large amounts of data;
  
  according to the instructions being executed;
  
  employing a Bayesian model selection approach to construct a decision graph having a model structure that includes at least two leaves, at least one leaf of the decision graph including at least one non-trivial linear regression;
  
  the Bayesian model selection approach comprising a greedy search algorithm to grow the model by adding leaves to the model so long as the model improves;
  
  providing a set of potential regressors having variables associated with the data, wherein the potential regressors are ordered according to their correlation ranging from a most useful regressor to a least useful regressor relative to a target variable to be predicted;
  
  employing the decision graph to predict one or more future observations in the time series data and performing a merge operation to the leaves after the model has more than two leaves and, wherein the non-trivial linear regression at the at least one leaf corresponding to at least one variable of the set of potential regressors and the merge operator operates on at least two leaves so that at least one non-root node of the decision graph has more than one parent node; and
  
  storing or displaying the one or more future observations.

35. A method of forecasting relative to time series data, the instructions comprising:
- storing, in a memory communicatively coupled to a processor, computer-executable instructions for performing the method of making predictions relative to the time series data, the predictions related to nontrivial extractions of implicit, previously unknown information obtained by data mining within large amounts of data;
  
  executing the instructions on the processor, wherein the instructions result in data mining within the large amounts of data;
  
  according to the instructions being executed;
  
  performing a greedy search to grow a model corresponding to the set of time series data represented by a decision graph having at least one non-trivial linear regression at leaves of the decision graph, the greedy search algorithm being performed iteratively relative to respective leaves of the model for a subset of potential regressors, wherein the potential regressors are arranged in the subset in a descending order of their correlation to a predicted variable and performing a merge operation to the leaves after the model has more than two leaves and, wherein the non-trivial linear regression at the at least one leaf corresponding to at least one variable of the subset of potential regressors and the merge operator operates on at east two leaves so that at least one non-root node of the decision graph has more than one parent node;
  
  computing a Bayesian score for the model;
  
  repeating the performance of the greedy search and computation of the Bayesian score so long as the model score improves;
  
  if the model score does not improve, providing a model having a model structure corresponding to a decision graph having a fixed number of leaves that include at least one nontrivial linear regression; and
  
  modifying a regressor variable at one of the leaves of the decision graph to provide a submodel;
  
  computing a Bayesian score for the submodel;
  
  repeating the modifying so long as the score of the submodel improves;
  
  if the score of the submodel does not improve relative to a previous model score, providing the regression decision graph model with the highest model score as a most likely model that provides a most accurate future observation for the time series data; and
  
  storing or displaying the future observation of the time series data.

36. A computer implemented system for predicting future observations of time series data having a set of associated variables, comprising:
- means for making predictions relative to the time series data, the predictions related to nontrivial extractions of implicit, previously unknown information obtained by data mining within large amounts of data;
  
  means for data mining within the large amounts of data;
  
  means for learning a model structure via a greedy Bayesian model selection approach for data corresponding to time series data for which a set of potential regressor variables have been created, the model structure comprising a decision graph that includes a plurality of leaves, at least one of the plurality of leaves including at least one linear regression on at least one continuous variable of the set of associated variables, wherein the potential regressors associated with the at least one linear regression are arranged in the set based on a descending order of their correlation for the at least one continuous variable;
  
  means for learning model parameters at the leaves of the model structure by adjusting the at least one variable on which the at least one linear regression is implemented for the at least one of the plurality of leaves, wherein the means for learning model parameters further comprising means for one of either adding or removing a potential regressor of the set of associated variables relative to a given leaf to provide a submodel and the means for learning a model structure further comprising means for merging at least two leaves based on regressors contained in the least two leaves so that at least one non-root node of the decision graph has more than one parent node;
  
  means for scoring the model in order to select a most suitable model to facilitate prediction;
  
  means for generating one or more future observations within the time series data by employing a highest scoring model as the most suitable model to predict the future observations;
  
  means for storing the one or more predicted future observations obtained during the data mining; and
  
  means for splitting a leaf into a non-leaf node associated with one of the variables and a pair of leaves, wherein the means for splitting being applied iteratively to respective leaves of the model to grow the model so long as the means for scoring provides an improved model score.
- View Dependent Claims (37, 38, 39)
- - 37. The system of claim 36, the means for scoring computing a Bayesian score for the model, which corresponds to a sum of the scores at the respective leaves of the model.
  - 38. The system of claim 36, the set of associated variables including one of a fixed set of potential regressors and a variable set of potential regressors, the linear regressions at the leaves of the model including linear regressions according to the one of the fixed set of potential regressors and the variable set of potential regressors.
  - 39. The system of claim 36, further comprisingmeans for selecting a best regressor to add or remove relative to the given leaf based on a score of each submodel;
    - andmeans for storing the submodel as an improved model.

40. A computer-implemented method to forecast future observations for time series data having a plurality of possible regressor variables associated with observations of the time series data, the time series data derived from data mining, the method comprising:
- storing, in a memory communicatively coupled to a processor, computer-executable instructions for performing the method of making predictions relative to time series data, the predictions related to nontrivial extractions of implicit, previously unknown information obtained by data mining within large amounts of data;
  
  executing the instructions on the processor, wherein the instructions result in data mining within the large amounts of data;
  
  according to the instructions being executed;
  
  progressively modifying a given regressor variable of the model to form a submodel, the model having at least one linear regression on at least one of the plurality of possible regressor variables and the model constructed employing greedy search, wherein the plurality of possible regressor variables are arranged in a descending order of their correlation relative to a target variable;
  
  computing a score for the submodel;
  
  repeating the progressively modifying with another regressor variable of the plurality of possible regressor variables so long as the computed score improves;
  
  improving the model by applying a highest scoring submodel as the regression model and performing a merge operation to leaves after the model has more than two leaves and, wherein the regression model comprises a regression decision graph model comprising a plurality of leaves and at least one non-leaf node, the at least one non-leaf node being associated with a Boolean function for one of a plurality of variables having a split value and at least one node of the decision graph having more than one parent node;
  
  forecasting one or more future observations of the time series data by employing the regression model; and
  
  storing or displaying the one or more future observations.
- View Dependent Claims (41, 42, 43, 44)
- - 41. The method of claim 40, the progressively modifying further comprising adding the given regressor variable to a respective leaf of the model to form the submodel.
  - 42. The method of claim 41, the progressively modifying further comprising removing the given regressor variable to a respective leaf of the model to form the submodel.
  - 43. The method of claim 40, the progressively modifying further comprising one of adding or removing the given regressor variable relative to a respective leaf of the model to form the submodel, the adding or removing being selected in dependence upon whether the given regressor variable is at the respective leaf.
  - 44. The method of claim 40, the regression model comprising an autoregressive tree model comprising a plurality of leaves and at least one non-leaf node, the at least one non-leaf node being associated with a Boolean function for one of a plurality of variables having a split value.

Specification

Resources

Litigation Campaign Assessment

Current Assignee
Microsoft Technology Licensing LLC (Microsoft Corporation)
Original Assignee
Microsoft Corporation
Inventors
Rounthwaite, Robert L., Meek, Christopher A., Chickering, David Maxwell, Heckerman, David E., Thiesson, Bo
Primary Examiner(s)
Rodriguez; Paul L
Assistant Examiner(s)
OSBORNE, LUKE R

Application Number

US10/102,116
Time in Patent Office

2,884 Days
Field of Search

703/2, 706/50, 706/52
US Class Current

703/2
CPC Class Codes

G06F 18/295 Markov models or related mo...

Bayesian approach for learning regression decision graph models and regression models for time series analysis

First Claim

2 Assignments

0 Petitions

Accused Products

Abstract

133 Citations

44 Claims

Specification

Use Cases

Quick Links

Others

Bayesian approach for learning regression decision graph models and regression models for time series analysis

First Claim

2 Assignments

Subscription Required

Subscription Required

0 Petitions

Subscription Required

Accused Products

Subscription Required

Abstract

133 Citations

44 Claims

Specification

Subscription Required

Use Cases

Quick Links

Others