Decision forest based classifier for determining predictive importance in real-time data analysis
First Claim
1. A computer-implemented method comprising:
- generating predictive importance via an instance classification software and predictive importance network generator stored at a storage medium and executed by a processor coupled with the storage medium, the generating of the predictive importance includingfor a first feature of a data set having a plurality of features, training a classifier to predict the first feature in terms of other features in the data set to obtain a trained classifier, wherein the data set is progressively classified into branches via a decision criterion such that the decision criterion is applied at each decision point, the decision criterion including functions of features, the features including the first feature and a second feature, wherein the classifier includes a forest based classifier;
scrambling the values of the second feature in the data set to obtain a scrambled data set, wherein scrambling including repeating the values of the second feature to be used for determining the predictive important of the second feature;
executing the trained classifier on the scrambled data set, wherein the trained classifier to facilitate distinguishing of content of the data set by relevancy of the content, wherein the relevancy is based on features contained in the data set;
determining the predictive importance of the second feature in predicting the first feature based at least in part on the accuracy of the trained classifier in predicting the first feature when executed with the scrambled data set, and based in part of other relevant features while ignoring other irrelevant features;
creating a graph of the data set in which each of the first and the second features is a node of the graph and a label on an edge between the first node and the second node is based at least in part on the predictive importance of the first feature in terms of the second feature;
applying the predictive importance to perform real-time diagnosis of factors including one or more of real-time medical analysis of a disease trend, real-time configuration of manufacturing settings to manufacture a product, and real-time safety analysis of a product; and
displaying, via a display device, the real-time diagnosis of the factors based on the predictive importance.
1 Assignment
0 Petitions
Accused Products
Abstract
For a first feature of a dataset having a plurality of features, training a classifier to predict the first feature in terms of other features in the data set to obtain a trained classifier; scrambling the values of a second feature in the data set to obtain a scrambled data set, executing the trained classifier on the scrambled data set, determining predictive importance of the seconds feature in predicting the first feature based at least in part on the accuracy of the trained classifier in predicting the first feature when executed with the scrambled data set and creating a graph of the data set in which each of the first and the second features is a node of the graph and a label on an edge between the first node and the second node is based at least in part on the predictive importance of the first feature in terms of the second feature.
24 Citations
17 Claims
-
1. A computer-implemented method comprising:
generating predictive importance via an instance classification software and predictive importance network generator stored at a storage medium and executed by a processor coupled with the storage medium, the generating of the predictive importance including for a first feature of a data set having a plurality of features, training a classifier to predict the first feature in terms of other features in the data set to obtain a trained classifier, wherein the data set is progressively classified into branches via a decision criterion such that the decision criterion is applied at each decision point, the decision criterion including functions of features, the features including the first feature and a second feature, wherein the classifier includes a forest based classifier; scrambling the values of the second feature in the data set to obtain a scrambled data set, wherein scrambling including repeating the values of the second feature to be used for determining the predictive important of the second feature; executing the trained classifier on the scrambled data set, wherein the trained classifier to facilitate distinguishing of content of the data set by relevancy of the content, wherein the relevancy is based on features contained in the data set; determining the predictive importance of the second feature in predicting the first feature based at least in part on the accuracy of the trained classifier in predicting the first feature when executed with the scrambled data set, and based in part of other relevant features while ignoring other irrelevant features; creating a graph of the data set in which each of the first and the second features is a node of the graph and a label on an edge between the first node and the second node is based at least in part on the predictive importance of the first feature in terms of the second feature; applying the predictive importance to perform real-time diagnosis of factors including one or more of real-time medical analysis of a disease trend, real-time configuration of manufacturing settings to manufacture a product, and real-time safety analysis of a product; and displaying, via a display device, the real-time diagnosis of the factors based on the predictive importance. - View Dependent Claims (2, 3, 4, 5, 6, 7)
-
8. A computer system comprising:
-
a processor communicatively coupled to a storage device; the storage device having stored thereon an instance classification software and predictive importance network generator when executed by a processor, causes the processor to generate predictive importance, the processor is further to; for a first feature of a dataset having a plurality of features, train a classifier to predict the first feature in terms of other features in the data set to obtain a trained classifier, wherein the data set is progressively classified into branches via a decision criterion such that the decision criterion is applied at each decision point, the decision criterion including functions of features, the features including the first feature and a second feature, wherein the classifier includes a forest based classifier; scramble the values of the second feature in the data set to obtain a scrambled data set, wherein scrambling including repeating the values of the second feature to be used for determining predictive important of the second feature, wherein the trained classifier is to facilitate distinguishing of content of the data set by relevancy of the content, wherein the relevancy is based on features contained in the data set; execute the trained classifier on the scrambled data set; determine the predictive importance of the second feature in predicting the first feature based at least in part on the accuracy of the trained classifier in predicting the first feature when executed with the scrambled data set, and based in part of other relevant features while ignoring other irrelevant features; create a graph of the data set in which each of the first and the second features is a node of the graph and a label on an edge between the first node and the second node is based at least in part on the predictive importance of the first feature in terms of the second feature; apply the predictive importance to perform real-time diagnosis of factors including one or more of real-time medical analysis of a disease trend, real-time configuration of manufacturing settings to manufacture a product, and real-time safety analysis of a product; and display, via a display device, the real-time diagnosis of the factors based on the predictive importance. - View Dependent Claims (9, 10)
-
-
11. A machine readable medium comprising instructions which, when executed, causes a machine to:
generate predictive importance via an instance classification software and predictive importance network generator stored at a storage medium and executed by a processor coupled with the storage medium, wherein the instructions which when executed to generate the predictive importance, further cause the machine to for a first feature of a dataset having a plurality of features, train a classifier to predict the first feature in terms of other features in the data set to obtain a trained classifier, wherein the data set is progressively classified into branches via a decision criterion such that the decision criterion is applied at each decision point, the decision criterion including functions of features, the features including the first feature and a second feature, wherein the classifier includes a forest based classifier; scramble the values of the second feature in the data set to obtain a scrambled data set, wherein scrambling including repeating the values of the second feature to be used for determining the predictive important of the second feature, wherein the trained classifier to facilitate distinguishing of content of the data set by relevancy of the content, wherein the relevancy is based on features contained in the data set; execute the trained classifier on the scrambled data set; and determine the predictive importance of the second feature in predicting the first feature based at least in part on the accuracy of the trained classifier in predicting the first feature when executed with the scrambled data set, and based in part of other relevant features while ignoring other irrelevant features; create a graph of the data set in which each of the first and the second features is a node of the graph and a label on an edge between the first node and the second node is based at least in part on the predictive importance of the first feature in terms of the second feature; applying the predictive importance to perform real-time diagnosis of factors including one or more of real-time medical analysis of a disease trend, real-time configuration of manufacturing settings to manufacture a product, and real-time safety analysis of a product; and displaying, via a display device, the real-time diagnosis of the factors based on the predictive importance. - View Dependent Claims (12, 13, 14, 15, 16, 17)
Specification