FlexSCAPE: Data Driven Hypothesis Testing and Generation System
First Claim
1. In a computer system, having one or more processors or virtual machines, each processor comprising at least one core, one or more memory units, one or more input devices and one or more output devices, optionally a network, and optionally shared memory supporting communication among the processors, a method for automatically generating and testing a hypothesis from a data set comprising the steps of:
- (a) selecting at least one informative combination of interacting features from a data set from the one or more memory units using a mutual information measure of the feature combination as the evaluation criterion;
(b) building at least one graphical model from at least one informative combination of interacting features;
(c) generating a hypothesis from at least one graphical model by optimizing a statistical measure associated with at least one state of at least one feature wherein the hypothesis is defined by at least one state associated with at least one feature from the data set; and
(d) testing at least one hypothesis generated from substep (c) from at least one graphical model.
0 Assignments
0 Petitions
Accused Products
Abstract
The present invention relates to a method for generating hypotheses automatically from graphical models built directly from data. The method of the present invention links three key scientific concepts to enable hypothesis generation from data driven hypothesis-models: including the use of information theory based measures to identify informative feature subsets within the data; the automatic generation of graphical models from the informative data subsets identified from step one; and the application of optimization methods to graphical models to enable hypothesis generation. The integration of these three concepts can enable scalable approaches to hypothesis generation from large, complex data environments. The use of graphical models as the model representation can allow prior knowledge to be effectively integrated into the modeling environment.
-
Citations
20 Claims
-
1. In a computer system, having one or more processors or virtual machines, each processor comprising at least one core, one or more memory units, one or more input devices and one or more output devices, optionally a network, and optionally shared memory supporting communication among the processors, a method for automatically generating and testing a hypothesis from a data set comprising the steps of:
-
(a) selecting at least one informative combination of interacting features from a data set from the one or more memory units using a mutual information measure of the feature combination as the evaluation criterion; (b) building at least one graphical model from at least one informative combination of interacting features; (c) generating a hypothesis from at least one graphical model by optimizing a statistical measure associated with at least one state of at least one feature wherein the hypothesis is defined by at least one state associated with at least one feature from the data set; and (d) testing at least one hypothesis generated from substep (c) from at least one graphical model. - View Dependent Claims (2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20)
-
Specification