Casual modeling of multi-dimensional hierarchical metric cubes
First Claim
1. A computer-implemented method for constructing and using a causal graphical analysis tool, the method comprising:
- storing, in a database accessible by a computer system, multidimensional data representing metrics related to an organization, the metrics being used by the computer system to perform prediction analysis on events associated with the organization;
constructing, by a processor of the computer system, from said multidimensional data, a causal graphical model representing a multidimensional hierarchical data structure, the multidimensional hierarchical data structure including one or more dimensions, each dimension having a plurality of nodes hierarchically arranged into a number of levels, each node having associated metric data, and including edges between nodes, each edge representing a directed relationship between metrics of connected nodes;
said causal graphical model constructing comprising;
acquiring, by the processor, first data corresponding to a first frontier representing a cut at a dimension level of the multi-dimensional hierarchical data structure, said first data being aggregated and segmented along each of the multiple dimensions;
performing, by the processor, modeling on the first data to obtain a first model and a corresponding first statistic, the performing the modeling including;
receiving, by the processor, input historical time series data {Xt}t=1, . . . , M, where each Xt is a p-dimensional vector, M and p are integer numbers;
receiving, via a user interface, specification of one or more metrics constraints, one or more dimensional constraints or both metrics constraints and dimensional constraints to control construction of said causal graphical network;
setting a graph G to (V, E), where the V is a set of p features, and the E is a set of edges between the features;
for each feature y, which belongs to the set V, running a regression on every y in terms of past lagged variables, Xt−
d, . . . , Xt−
1, for all features x, which belongs to the set V;
for each feature x, which belongs to the set V, placing an edge directed from the x to they into set E if x is selected as a group by the regression; and
iterating through said multi-dimensional hierarchical data structure to further expand a frontier dimension of the first frontier to obtain a new frontier level, wherein at each iteration;
gathering, by the processor, further data corresponding to the expanded new frontier level, said further data being aggregated and segmented along each of the multiple dimensions;
applying, by the processor, the data modeling on the further data to obtain a further model and a corresponding further statistic;
comparing, by the processor, the first statistic of the first model and the further statistic of the further model;
setting, by the processor, the further model to be the causal graphical model and setting the new frontier level to be the first frontier in response to determining that the further statistic improves the first model statistic;
repeating, by the processor, the iterating and new frontier level expanding until there are no new frontier dimensions to further expand;
outputting, by the processor, the causal graphical model as structured data providing an expanded frontier level of statistically significant metric relationships and learned impacts between metric measures for conducting a causal analysis;
predicting, by the processor, future values of metrics of the causal graphical model based on an inference using the first data and the metrics of the causal graphical model;
identifying, by the processor, a main metric that includes predicted future values that deviates away from a mean of the main metric with respect to time;
determining, by the processor, a causal relationship between each metric of the causal graphical model and the main metric based on a comparison of each metric of the causal graphical model with the main metric, wherein each causal relationship indicates an effect of the main metric on a deviation distance between the metric of the causal graphical model with a respective desired value;
identifying, by the processor, a set of candidate metrics from the metrics of the causal graphical model, wherein each candidate metrics corresponds to a deviation distance below a threshold;
determining, by the processor, causal relations and associated measures of strengths at the expanded frontier level in the outputted structured data by calculating a causal strength, per each candidate relation between each candidate metric and the main metric, or a pair of metrics in the cut in the frontier, as a weighted sum of causal strengths of causal relations whose dimension nodes are equal or descendents of the each candidate relation, wherein each weighted sum is a result of an application of a weight on a candidate metric, and each weight is determined by a ratio between a value of a target metric at the first frontier and an aggregated value of target metrics at the first frontier;
receiving, by the processor and via the user interface, a request for a prediction analysis on a first candidate relation associated with a first candidate metric and a second candidate relation associated with a second candidate metric;
aggregating, by the processor, a first causal strength of the first relation and a second causal strength of the second relation, wherein the first causal strength is a first weighted sum of the first candidate metric, the second causal strength is a second weighted sum of the second candidate metric, and aggregating the first causal strength and the second causal strength instead of the first candidate metric and the second candidate metric provides an indication of a predicted effect on an event caused by activities associated with the first candidate relation and the second candidate relation instead of effects on the event caused by the first candidate metric and the second candidate metric; and
outputting, by the processor, the aggregated causal strengths on the user interface.
1 Assignment
0 Petitions
Accused Products
Abstract
A computing system initializes a first frontier to be a root of a multi-dimensional hierarchical data structure representing an entity. The system acquires first data corresponding to the first frontier. The system performs modeling on the first data to obtain a first model and a corresponding first statistic. The system expands a dimension of the first frontier. The system gathers second data corresponding to the expanded frontier. The system applies the data modeling on the second data to obtain a second model and a corresponding second statistic. The system compares the first statistic of the first model and the second statistic of the second model. The system sets the second model to be the first model in response to determining that the second model statistic is better than the first model statistic. The system outputs the first model.
63 Citations
24 Claims
-
1. A computer-implemented method for constructing and using a causal graphical analysis tool, the method comprising:
-
storing, in a database accessible by a computer system, multidimensional data representing metrics related to an organization, the metrics being used by the computer system to perform prediction analysis on events associated with the organization; constructing, by a processor of the computer system, from said multidimensional data, a causal graphical model representing a multidimensional hierarchical data structure, the multidimensional hierarchical data structure including one or more dimensions, each dimension having a plurality of nodes hierarchically arranged into a number of levels, each node having associated metric data, and including edges between nodes, each edge representing a directed relationship between metrics of connected nodes; said causal graphical model constructing comprising; acquiring, by the processor, first data corresponding to a first frontier representing a cut at a dimension level of the multi-dimensional hierarchical data structure, said first data being aggregated and segmented along each of the multiple dimensions; performing, by the processor, modeling on the first data to obtain a first model and a corresponding first statistic, the performing the modeling including; receiving, by the processor, input historical time series data {Xt}t=1, . . . , M, where each Xt is a p-dimensional vector, M and p are integer numbers; receiving, via a user interface, specification of one or more metrics constraints, one or more dimensional constraints or both metrics constraints and dimensional constraints to control construction of said causal graphical network; setting a graph G to (V, E), where the V is a set of p features, and the E is a set of edges between the features; for each feature y, which belongs to the set V, running a regression on every y in terms of past lagged variables, Xt−
d, . . . , Xt−
1, for all features x, which belongs to the set V;for each feature x, which belongs to the set V, placing an edge directed from the x to they into set E if x is selected as a group by the regression; and iterating through said multi-dimensional hierarchical data structure to further expand a frontier dimension of the first frontier to obtain a new frontier level, wherein at each iteration; gathering, by the processor, further data corresponding to the expanded new frontier level, said further data being aggregated and segmented along each of the multiple dimensions; applying, by the processor, the data modeling on the further data to obtain a further model and a corresponding further statistic; comparing, by the processor, the first statistic of the first model and the further statistic of the further model; setting, by the processor, the further model to be the causal graphical model and setting the new frontier level to be the first frontier in response to determining that the further statistic improves the first model statistic; repeating, by the processor, the iterating and new frontier level expanding until there are no new frontier dimensions to further expand; outputting, by the processor, the causal graphical model as structured data providing an expanded frontier level of statistically significant metric relationships and learned impacts between metric measures for conducting a causal analysis; predicting, by the processor, future values of metrics of the causal graphical model based on an inference using the first data and the metrics of the causal graphical model; identifying, by the processor, a main metric that includes predicted future values that deviates away from a mean of the main metric with respect to time; determining, by the processor, a causal relationship between each metric of the causal graphical model and the main metric based on a comparison of each metric of the causal graphical model with the main metric, wherein each causal relationship indicates an effect of the main metric on a deviation distance between the metric of the causal graphical model with a respective desired value; identifying, by the processor, a set of candidate metrics from the metrics of the causal graphical model, wherein each candidate metrics corresponds to a deviation distance below a threshold; determining, by the processor, causal relations and associated measures of strengths at the expanded frontier level in the outputted structured data by calculating a causal strength, per each candidate relation between each candidate metric and the main metric, or a pair of metrics in the cut in the frontier, as a weighted sum of causal strengths of causal relations whose dimension nodes are equal or descendents of the each candidate relation, wherein each weighted sum is a result of an application of a weight on a candidate metric, and each weight is determined by a ratio between a value of a target metric at the first frontier and an aggregated value of target metrics at the first frontier; receiving, by the processor and via the user interface, a request for a prediction analysis on a first candidate relation associated with a first candidate metric and a second candidate relation associated with a second candidate metric; aggregating, by the processor, a first causal strength of the first relation and a second causal strength of the second relation, wherein the first causal strength is a first weighted sum of the first candidate metric, the second causal strength is a second weighted sum of the second candidate metric, and aggregating the first causal strength and the second causal strength instead of the first candidate metric and the second candidate metric provides an indication of a predicted effect on an event caused by activities associated with the first candidate relation and the second candidate relation instead of effects on the event caused by the first candidate metric and the second candidate metric; and outputting, by the processor, the aggregated causal strengths on the user interface. - View Dependent Claims (2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14)
-
-
15. A computer-implemented causal graphical analysis system comprising:
-
a memory device accessible by a computer system, for storing multidimensional data representing metrics related to an organization, the metrics being used by the computer system to perform prediction analysis on events associated with the organization; and a processor being connected to the memory device, wherein the processor is configured to implement a learning engine to; construct, from said multidimensional data, a causal graphical model representing a multidimensional hierarchical data structure, the multidimensional hierarchical data structure including multiple dimensions, each dimension having a plurality of nodes hierarchically arranged into a number of levels, each node having associated metric data, and including edges between nodes, each edge representing a directed relationship between metrics of connected nodes; wherein construction of the causal graphical model comprises; acquire first data corresponding to a first frontier representing a cut at a dimension level of the multi-dimensional hierarchical data structure, said first data being aggregated and segmented along each of the multiple dimensions; perform modeling on the first data to obtain a first model and a corresponding first statistic, the performing the modeling including; receive historical input time series data {Xt}t=1, . . . , M, where each Xt is a p-dimensional vector, M and p are integer numbers; receive, via a user interface, specification of one or more metrics constraints, one or more dimensional constraints or both metrics constraints and dimensional constraints to control construction of said causal graphical network; set a graph G to (V, E), where the V is a set of p features, and the E is a set of edges between the features; for each feature y, which belongs to the set V, run a regression on every y in terms of past lagged variables, Xt−
d, . . . , Xt−
1, for all features x, which belongs to the set V;for each feature x, which belongs to the set V, place an edge directed from the x to the y if x is selected as a group by the regression; and iterate through said multi-dimensional hierarchical data structure to further expand a frontier dimension of the first frontier to obtain a new frontier level, wherein at each iteration, the learning engine is further configured to; gather further data corresponding to the expanded new frontier level, said further data being aggregated and segmented along each of the multiple dimensions; apply the data modeling on the further data to obtain a further model and a corresponding further statistic; compare the first statistic of the first model and the further statistic of the further model; set the further model to be the causal graphical model and setting the new frontier level to the first frontier in response to determining that the further statistic improves the first model statistic; output the causal graphical model as structured data providing an expanded frontier level of statistically significant metric relationships and learned impacts between metric measures for conducting a causal analysis; predict future values of metrics of the causal graphical model based on an inference using the first data and the metrics of the causal graphical model; identify a main metric that includes predicted future values that deviates away from a mean of the main metric with respect to time; determine a causal relationship between each metric of the causal graphical model and the main metric based on a comparison of each metric of the causal graphical model with the main metric, wherein each causal relationship indicates an effect of the main metric on a deviation distance between the metric of the causal graphical model with a respective desired value; identify a set of candidate metrics from the metrics of the causal graphical model, wherein each candidate metrics corresponds to a deviation distance below a threshold; determine causal relations and associated measures of strengths at the expanded frontier level in the outputted structured data by calculating a causal strength, per each candidate relation between each candidate metric and the main metric, or a pair of metrics in the cut in the frontier, as a weighted sum of causal strengths of causal relations whose dimension nodes are equal or descendents of the each candidate relation, wherein each weighted sum is a result of an application of a weight on a candidate metric, and each weight is determined by a ratio between a value of a target metric at the first frontier and an aggregated value of target metrics at the first frontier; receive, via the user interface, a request for a prediction analysis on a first candidate relation associated with a first candidate metric and a second candidate relation associated with a second candidate metric; aggregate a first causal strength of the first relation and a second causal strength of the second relation, wherein the first causal strength is a first weighted sum of the first candidate metric, the second causal strength is a second weighted sum of the second candidate metric, and aggregating the first causal strength and the second causal strength instead of the first candidate metric and the second candidate metric provides an indication of a predicted effect on an event caused by activities associated with the first candidate relation and the second candidate relation instead of effects on the event caused by the first candidate metric and the second candidate metric; and output the aggregated causal strengths on the user interface. - View Dependent Claims (16, 17, 18, 19, 20, 21, 22)
-
-
23. A computer program product for a causal graphical analysis tool, the computer program product comprising a non-transitory storage medium readable by a processing circuit and storing instructions run by the processing circuit for performing a method, the method comprising:
-
storing, in a database accessible by a computer system, multidimensional data representing metrics related to an organization, the metrics being used by the computer system to perform prediction analysis on events associated with the organization; constructing, by a learning engine implemented on a processor at said computer system, from said multidimensional data, a causal graphical model representing a multidimensional hierarchical data structure, the multidimensional hierarchical data structure including multiple dimensions, each dimension having a plurality of nodes hierarchically arranged into a number of levels, each node having associated metric data, and including edges between nodes, each edge representing a directed relationship between metrics of connected nodes; said causal graphical model constructing comprising; acquiring, by the learning engine, first data corresponding to a first frontier representing a cut at a dimension level of the multi-dimensional hierarchical data structure, said first data being aggregated and segmented along each of the multiple dimensions; performing, by the learning engine, data modeling on the first data to obtain a first model and a corresponding first statistic, the performing the modeling including; receiving historical input time series data {Xt}t=1, . . . , M, where each Xt is a p-dimensional vector, M and p are integer numbers; receiving, via a user interface, specification of one or more metrics constraints, one or more dimensional constraints or both metrics constraints and dimensional constraints to control construction of said causal graphical network; setting a graph G to (V, E), where the V is a set of p features, and the E is a set of edges between the features; for each feature y, which belongs to the set V, running a regression on every y in terms of past lagged variables, Xt−
d, . . . , Xt−
1, for all features x, which belongs to the set V;for each feature x, which belongs to the set V, placing an edge directed from the x to the y if x is selected as a group by the regression; and iterating through said multi-dimensional hierarchical data structure to further expand a frontier dimension of the first frontier to obtain a new frontier level, wherein at each iteration, said processing circuit performs; gathering, by the learning engine, further data corresponding to the expanded new frontier level, said further data being aggregated and segmented along each of the multiple dimensions; applying, by the learning engine, the modeling on the further data to obtain a further model and a corresponding further statistic; comparing, by the learning engine, the first statistic of the first model and the further statistic of the further model; setting, by the learning engine, the further model to be the causal graphical model and setting the new frontier level to the first frontier in response to determining that the further statistic is better than the first model statistic; outputting, by the learning engine, the causal graphical model as structured data representing an expanded frontier level of statistically significant metric relationships and learned impacts between metric measures for conducting a causal analysis; predicting, by the processor, future values of metrics of the causal graphical model based on an inference using the first data and the metrics of the causal graphical model; identifying, by the processor, a main metric that includes predicted future values that deviates away from a mean of the main metric with respect to time; determining, by the processor, a causal relationship between each metric of the causal graphical model and the main metric based on a comparison of each metric of the causal graphical model with the main metric, wherein each causal relationship indicates an effect of the main metric on a deviation distance between the metric of the causal graphical model with a respective desired value; identifying, by the processor, a set of candidate metrics from the metrics of the causal graphical model, wherein each candidate metrics corresponds to a deviation distance below a threshold; determining, by the learning engine, causal relations and associated measures of strengths in the expanded frontier level in the outputted structured data by calculating a causal strength, per each candidate relation between each candidate metric and the main metric, or a pair of metrics in the cut in the frontier, as a weighted sum of causal strengths of causal relations whose dimension nodes are equal or descendents of the each candidate relation, wherein each weighted sum is a result of an application of a weight on a candidate metric, and each weight is determined by a ratio between a value of a target metric at the first frontier and an aggregated value of target metrics at the first frontier; receiving, by the processor and via the user interface, a request for a prediction analysis on a first candidate relation associated with a first candidate metric and a second candidate relation associated with a second candidate metric; aggregating, by the processor, a first causal strength of the first relation and a second causal strength of the second relation, wherein the first causal strength is a first weighted sum of the first candidate metric, the second causal strength is a second weighted sum of the second candidate metric, and aggregating the first causal strength and the second causal strength instead of the first candidate metric and the second candidate metric provides an indication of a predicted effect on an event caused by activities associated with the first candidate relation and the second candidate relation instead of effects on the event caused by the first candidate metric and the second candidate metric; and outputting, by the processor, the aggregated causal strengths on the user interface. - View Dependent Claims (24)
-
Specification