Distributed hierarchical evolutionary modeling and visualization of empirical data
First Claim
1. A computer-implemented method of selecting a feature set having a global informational content above a predefined threshold, the feature set being selected from an initial feature set of inputs corresponding to inputs to a system having measurable inputs and outputs,wherein a large number of input data points to the system and corresponding output data points from the system are acquired to define a data set, and the acquired input and output data points are stored in a storage device, the method comprising the steps of:
- (a) creating a plurality of feature subspaces, each said feature subspace comprising a set of features from the data set, (b) quantizing the inputs of the data set, the inputs having a range of values, by dividing the range of values into subranges, thereby dividing said feature subspace into a plurality of cells, (c) determining the global level of informational content of each feature subspace by calculating at least one local cell Nishi-formulated entropy E to define a local entropic weight W as the complement of the Nishi-formulated entropy E (W=1−
E), and (d) selecting at least one feature set that has a global informational content above the predefined threshold.
2 Assignments
0 Petitions
Accused Products
Abstract
A distributed hierarchical evolutionary modeling and visualization of empirical data method and machine readable storage medium for creating an empirical modeling system based upon previously acquired data. The data represents inputs to the systems and corresponding outputs from the system. The method and machine readable storage medium utilize an entropy function based upon information theory and the principles of thermodynamics to accurately predict system outputs from subsequently acquired inputs. The method and machine readable storage medium identify the most information-rich (i.e., optimum) representation of a data set in order to reveal the underlying order, or structure, of what appears to be a disordered system. Evolutionary programming is one method utilized for identifying the optimum representation of data.
235 Citations
68 Claims
-
1. A computer-implemented method of selecting a feature set having a global informational content above a predefined threshold, the feature set being selected from an initial feature set of inputs corresponding to inputs to a system having measurable inputs and outputs,
wherein a large number of input data points to the system and corresponding output data points from the system are acquired to define a data set, and the acquired input and output data points are stored in a storage device, the method comprising the steps of: -
(a) creating a plurality of feature subspaces, each said feature subspace comprising a set of features from the data set, (b) quantizing the inputs of the data set, the inputs having a range of values, by dividing the range of values into subranges, thereby dividing said feature subspace into a plurality of cells, (c) determining the global level of informational content of each feature subspace by calculating at least one local cell Nishi-formulated entropy E to define a local entropic weight W as the complement of the Nishi-formulated entropy E (W=1−
E), and(d) selecting at least one feature set that has a global informational content above the predefined threshold. - View Dependent Claims (2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 39, 67)
-
-
28. A computer-implemented method of defining a model of a system having measurable inputs and outputs from a data set that most accurately predicts system outputs from system inputs,
wherein a large number of input data points to the system and corresponding output data points from the system are acquired, the input and output data points are stored in a storage device, and the acquired input and output data points are grouped into at least one training data set and at least one test data set by selecting corresponding combinations of inputs and outputs of the system, the method comprising the steps of: -
(a) creating a plurality of feature subspaces, each said feature subspace comprising a set of features from the training data set, each feature subspace having a dimension, wherein the dimension of a feature subspace is the number of inputs in the subspace, (b) quantizing the inputs of the training data set, the inputs having a range of values, by dividing the range of values into subranges, thereby dividing said feature subspace into a plurality of cells, (c) determining the global level of informational content of each feature subspace by calculating at least one local cell Nishi-formulated entropy E to define a local entropic weight W as the complement of the Nishi-formulated entropy E (w=1−
E),(d) selecting at least one feature set that has a global informational content above a predefined threshold, and (e) searching over the plurality of feature subspaces of the training data set under a plurality of quantization conditions by repeating steps (b)-(d) to determine an optimum or near-optimum dimensionality and an optimum or near-optimum quantization condition of cells, the combination of which most accurately predicts system outputs from system inputs on the test data set, thereby defining a model. - View Dependent Claims (29, 30, 40)
-
-
31. A computer-implemented method of defining a framework by selecting a group of models of a system having measurable inputs and outputs that most accurately predict system outputs from system inputs,
wherein a large number of input data points t to the system and corresponding output data points from the system are acquired, the acquired input and output data points are stored in a storage device, and the acquired input and output data points are grouped into at least one training data set and at least one test data set by selecting corresponding combinations of inputs and outputs of the system, the method comprising the steps of: -
(a) defining a feature subspace as a combination of one or more inputs, wherein the dimension of a feature is the number of inputs in the combination;
(b) determining a combination of feature subspaces having a global informational content above a predefined threshold by;
(i) selecting the training data set;
(ii) creating a plurality of feature subspaces from the training data set;
(iii) quantizing the inputs of the training data set with respect to each feature subspace, the inputs having a range of values, by dividing the range of values into subranges thereby dividing each feature subspace into a plurality of cells, each cell having a cell population being defined as the number of training set data points which occupy each cell, (iv) determining the local Nishi-formulated informational entropy E of each cell in the subspace, (v) using the local informational entropy (E) to define a local entropic weight W as the complement of the Nishi-formulated entropy E (W=1−
E), and using the local entropic weight W to determine the global informational content of each feature subspace,(vi) determining a set of feature subspaces that have a global informational content above the predefined threshold;
(c) selecting a model comprising a set of feature subspaces that most accurately predicts system outputs from system inputs on the test data set;
(d) repeating steps (a)-(c) on different training and test data sets to define a group of models;
(e) creating a new training data set and a new test data set using individual model output-predicted values as inputs and actual output values as outputs; and
(f) selecting a subset group of optimum models from the group of models that most accurately predict system outputs from system inputs on the new test data set to define the framework. - View Dependent Claims (32, 33)
-
-
34. A computer-implemented method of defining a super-framework of a system having measurable inputs and outputs by selecting a group of frameworks that most accurately predict system outputs from system inputs,
wherein a large number of input data points to the system and corresponding output data points from the system are acquired, the acquired input and output data points are stored in a storage device, and the acquired input and output data points are grouped into at least one training data set and at least one test data set by selecting corresponding combinations of inputs and outputs of the system, the method comprising the steps of: -
(a) defining a feature subspace as a combination of one or more inputs, wherein the dimension of a feature subspace is the number of inputs in the combination;
(b) determining a combination of feature subspaces of a global informational content above a predefined threshold by;
(i) selecting the training data set, (ii) creating an initial set of features from the training data set, (iii) quantizing the inputs of the training data set, the inputs having a range of values, by dividing the range of values into subranges, thereby dividing each feature subspace into a plurality of cells, the cells being defined by combinations of subranges of inputs, each cell having a cell population being defined as the number of training data set data points which occupy each cell, (iv) determining the local Nishi-formulated informational entropy E of each cell in the subspace, (v) using the local informational entropy E to define a local entropic weight W as the complement of the Nishi-formulated entropy E (W=1−
E), and using the local entropic weight W to determine the global informational content of each feature subspace,(vi) determining a set of feature subspaces that have a global informational content above a predefined threshold;
(c) selecting a model comprising a combination of features subspaces that most accurately predicts system outputs from system inputs on the test data set;
(d) repeating steps (a)-(c) on different training data sets and test data sets to define a group of models;
(e) creating a new training data set and a new test data set using individual model output-predicted values as inputs and actual output values as outputs;
(f) defining a framework by selecting a subset group of optimum models from the group of models that most accurately predict system outputs from system inputs on the new test data set;
(g) repeating steps (a)-(f) on different training data sets and test data sets to define a group of optimum frameworks;
(h) creating a new training data set and a new test data set using individual framework output-predicted values as inputs and actual output values as the outputs; and
(i) defining a super-framework by selecting a subset group of frameworks from the group of optimum frameworks that most accurately predict system outputs from system inputs on the new test data set. - View Dependent Claims (35, 36)
-
-
37. A computer-implemented method of evolving a mathematical relationship between inputs and outputs in an empirical data set acquired from a system having measurable inputs and outputs,
wherein a large number of input data points to the system and corresponding output data points from the system are acquired, the acquired input and output data points are stored in a storage device, and the acquired input and output data points are grouped into at least one training data set and at least one test data set by selecting corresponding combinations of inputs and outputs of the system, the method comprising the steps of: -
(a) defining a feature subspace as a combination of one or more inputs, wherein the dimension of a feature subspace is the number of inputs in the combination;
(b) determining a combination of feature subspaces having a global informational entropy above a predefined threshold by;
(i) selecting the training data set, (ii) creating an initial set of feature subspaces from the training data set, (iii) quantizing the inputs of the training data set, the inputs having a range of values, by dividing the range of values into subranges, thereby dividing each feature subspace into a plurality of cells, each cell having a cell population being defined as the number of training set data points which occupy each cell, (iv) determining the local Nishi-formulated informational entropy E of each cell in the subspace relative to each output of the subspace, (v) using the local informational entropy E to define a local entropic weight W as the complement of the Nishi-formulated entropy E (W=1−
E), and using the local entropic weight W to determine the global informational entropy of each feature subspace,(vi) selecting a set of feature subspaces that have a global informational entropy above the predefined threshold;
(c) selecting the feature subspace with the highest global informational entropy from the feature data set;
(d) creating a reduced-dimensionality data set by selecting only those inputs from the data set that are contained in the selected feature subspace; and
(e) applying a genetic programming method to evolve a mathematical relationship between the inputs and outputs of the reduced-dimensionality data set.
-
-
38. A hybrid method of evolving a relationship between inputs and outputs in an empirical data set acquired from a system having measurable inputs and outputs, using the model generating method of one of claim, comprising the steps of:
-
(a) generating a first model from a data set;
(b) generating a second model using the same modeling method, by either;
i) creating a plurality of feature subspaces different from the first model generating step, or ii) dividing the feature subspace into a different plurality of cells by quantizing the inputs differently from the first model generating step;
(c) dividing the data set into subsets and determining a local performance of each model in each subset;
(d) generating a weighting function based upon the local performance of the first and second models in each subset; and
(e) combining the first and second models using the weighting function, thereby combining the local performance advantages of each of the models.
-
-
41. A machine-readable storage medium containing a set of instructions for causing a computing device to generate a model of a system using measurable inputs and measurable outputs of the system, said instructions causing the computing device to execute the steps of:
-
creating a plurality of feature subspaces, each said feature subspace comprising a set of features from data acquired from the system;
determining the global level of informational content of each feature subspace by calculating at least one local cell Nishi-formulated entropy E to define a local entropic weight W as the complement of the Nishi-formulated entropy E (W=1−
E)searching the plurality of feature subspaces to locate feature subspaces having informational content above a predefined threshold, said located feature subspaces comprising combinations of one or more inputs;
searching a plurality of models, said models comprising one or more of said located feature subspaces, each of said models having an associated output prediction; and
selecting one of said models having an output prediction accuracy that is greater than that of at least one other model. - View Dependent Claims (42, 43, 44, 45, 46, 47, 48, 49, 50, 51, 52, 53, 54, 55, 56, 57, 58, 59, 60)
-
-
61. A machine-readable storage medium containing data structures, said data structures comprising:
-
a feature subspace data structure containing data representing a plurality of input combinations corresponding to a plurality of feature subspaces;
a model data structure containing data representing a plurality of feature subspace combinations;
a data structure containing data used to specify cell regions for each feature subspace; and
a training data structure containing data representing the training data set needed to populate the feature subspaces; and
further containing a data structure containing entropic weights for each subspace, each entropic weight being based upon at least one local cell Nishi-formulated entropy E, each local entropic weight W being defined as the complement of the Nishi-formulated entropy E (W=1−
E). - View Dependent Claims (62, 63, 64)
-
-
65. A machine-readable storage medium containing a plurality of data structures, said plurality of data structures being used to determine a system output prediction response to system input data points, said data structures comprising:
-
a mapping data structure containing data used to map an input data point to a cell prediction value, wherein the prediction values are weighted probability vectors;
a model data structure containing data representing a plurality of feature subspace combinations, and, further comprising a weighting data structure containing data representing local entropic weights w and global entropic weights, each entropic weight being based upon at least one local cell Nishi-formulated entropy E, each local entropic weight W being defined as the complement of the Nishi-formulated entropy E (W=1−
E).- View Dependent Claims (66)
-
-
68. A machine-readable storage medium containing a set of instructions for causing a computing device to generate a model of a system using measurable inputs and measurable outputs of the system, wherein a large number of input data points to the system and corresponding output data points from the system are acquired to define a data set, said instructions causing the computing device to execute the steps of:
-
(a) creating a plurality of feature subspaces, each said feature subspace comprising a set of features from the data set, (b) quantizing the inputs of the data set, the inputs having a range of values, by dividing the range of values into subranges, thereby dividing said feature subspace into a plurality of cells, (c) determining the global level of informational content of each feature subspace by calculating at least one local cell Nishi-formulated entropy E to define a local entropic weight W as the complement of the Nishi-formulated entropy E (W=1−
E), and(d) selecting at least one feature set that has a global informational content above a predefined threshold.
-
Specification