System for data mining using neuroagents
First Claim
1. A data mining system comprising:
- a study manager with which a discovery domain, an evaluation domain and a prediction domain can be selected from at least one data sources, each data source including one or more data records, each data record having one or more parameters;
a knowledge model engine, coupled to the study manager, which;
(i) when presented with the discovery domain, constructs an explicitly predictive knowledge model in the form of a neuroagent network therefrom and returns a discovery results set, and(ii) when presented with either of the evaluation or prediction domains, applies the neuroagent network thereto and returns an evaluation or prediction results set, respectively;
a discovery manager, coupled to the knowledge model engine, which takes the discovery results set from the knowledge model engine and calculates the relative significance of the parameters under the knowledge model;
an evaluation manager, coupled to the knowledge model engine, which takes the evaluation results set from the knowledge model engine and calculates the accuracy of the knowledge model; and
a prediction manager, coupled to the knowledge model engine, which takes the prediction results set from the knowledge model engine and calculates the predictions of the knowledge model.
8 Assignments
0 Petitions
Accused Products
Abstract
A neuroagent approach is used in an automated and unified data mining system to provide an explicitly predictive knowledge model. The neuroagent is a neural multi-agent approach based on macro-connectionism and comprises a double integration at the association and symbolic level as well as the knowledge model level. This data mining system permits discovery, evaluation and prediction of the correlative factors of data, i.e., the conjunctions, as corresponding to neuroexpressions (a semantic connection of neuroagents) connected to an output neuroagent which corresponds to the data output, the connection weights yielding the relative significance of these factors to the given output. The system takes data sets called Domains, establishes candidate dimensions or Parameters, categorizes Parameters into discrete bins, and trains a neuroagent network composed of neuroagents allocated for each bin and each output based on a discovery data set, called a Discovery Domain, and by building up the various minimal and contextual neuroexpressions, and setting the appropriate connection weights, the results may therefore be compared with an optional evaluation data set, called an Evaluation Domain to establish the accuracy of the knowledge model, and thereafter applied with some degree of confidence to a prediction set or Prediction Domain. The ranking in importance of the composite Parameters may be calculated as well as the discrimination between the various outputs, which permits the relevant factors of interest to a decision maker to come into focus.
302 Citations
33 Claims
-
1. A data mining system comprising:
-
a study manager with which a discovery domain, an evaluation domain and a prediction domain can be selected from at least one data sources, each data source including one or more data records, each data record having one or more parameters; a knowledge model engine, coupled to the study manager, which; (i) when presented with the discovery domain, constructs an explicitly predictive knowledge model in the form of a neuroagent network therefrom and returns a discovery results set, and (ii) when presented with either of the evaluation or prediction domains, applies the neuroagent network thereto and returns an evaluation or prediction results set, respectively; a discovery manager, coupled to the knowledge model engine, which takes the discovery results set from the knowledge model engine and calculates the relative significance of the parameters under the knowledge model; an evaluation manager, coupled to the knowledge model engine, which takes the evaluation results set from the knowledge model engine and calculates the accuracy of the knowledge model; and a prediction manager, coupled to the knowledge model engine, which takes the prediction results set from the knowledge model engine and calculates the predictions of the knowledge model. - View Dependent Claims (2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20)
-
-
21. A data mining system comprising:
-
a study manager with which a discovery domain can be selected from one or more data sources, each data source including at least one data records, each data record having one or more parameters; a knowledge model engine, coupled to the study manager, which when presented with the discovery domain, constructs an explicitly predictive knowledge model in the form of a neuroagent network therefrom and returns a discovery results set; and a discovery manager, coupled to the knowledge model engine, which takes the discovery results set from the knowledge model engine and calculates the relative significance of the parameters under the knowledge model.
-
-
22. A data mining system comprising:
-
a study manager with which a prediction domain can be selected from one or more data sources, each data source including at least one data records, each data record having one or more parameters; a knowledge model engine, coupled to the study manager, which when presented with the prediction domain, applies an explicitly predictive knowledge model in the form of a neuroagent network thereto and returns a prediction results set; and a prediction manager, coupled to the knowledge model engine, which takes the prediction results set from the knowledge model engine and calculates the predictions of the knowledge model. - View Dependent Claims (33)
-
-
23. In a data mining system wherein a discovery domain includes a plurality of parameters having one or more input parameters and one output parameter, the discovery domain selected from one or more data sources, each data source providing one or more data records, a method of creating meta data from the discovery domain comprising:
-
(i) for each parameter, input or output, determining the parameter'"'"'s maximal value, minimal value, number of distinct values and number of null values; (ii) if, for a parameter, the number of distinct values exceeds a first predetermined number, setting the number of distinct values to the first predetermined number; (iii) for each parameter, creating a statistical meta data table, with a row allocated for each distinct value of the parameter, and the count of data records from the data sources having that distinct value, up to the first predetermined number; and (iv) saving the statistical meta data tables for each of the parameters. - View Dependent Claims (24, 25, 26, 27, 28)
-
-
29. A method of data mining using a discovery domain comprised of a plurality of data records, each data record comprised of a plurality of parameters, one of the plurality of parameters termed an output parameter, all other of the plurality of parameters termed input parameters, each parameter categorized into a plurality of data bins, a neuroagent initialized for each represented data bin of the plurality of data bins for each parameter, each neuroagent of an input parameter termed an input neuroagent, each neuroagent of an output parameter termed an output neuroagent, each neuroagent including a communication and activation envelope, the method comprising the steps of:
-
(i) creating a population meta data table for each parameter in accordance with which one or more neuroagents which are initialized; (ii) for a first data record of the plurality of data records whose output parameter corresponds to a first output neuroagent, (a) identifying in the first data record a first set of neuroagents corresponding to data bins represented in the first data record, (b) forming a first neuroexpression from the first set of neuroagents, and (c) connecting the first neuroexpression to the communication and activation envelope of the first output neuroagent; (iii) for any subsequent second data record of the plurality of data records whose output parameter corresponds to the first output neuroagent, a current set of neuroexpressions connected to the communication and activation envelope of the first output neuroagent, each of the current set of neuroexpressions including a current set of neuroagents, (a) identifying a second set of neuroagents corresponding to data bins represented in the second data record, (b) forming a second neuroexpression from the second set of neuroagents, (c) forming a new neuroexpression from a difference of the second neuroexpression and a union of the current set of neuroexpressions; (d) from each current neuroexpression of the current set of neuroexpressions, forming a common neuroexpression as an intersection of the current neuroexpression with the second neuroexpression, and a non-common neuroexpression as a difference of the current neuroexpression with the second neuroexpression, and (c) from the communication and activation envelope of the first output neuroagent, disconnecting the current set of neuroexpressions and connecting the new neuroexpression and for each of the current set of neuroexpressions, the common neuroexpressions and the non-common neuroexpressions, all these neuroexpressions so connected designated as a new current set of neuroexpressions; (iv) repeating step (iii) above for all the subsequent second data records; and (v) for the network of neuroagents formed by steps (i)-(iv) above, calculating the importance of at least one input neuroagent towards at least one output neuroagent. - View Dependent Claims (30, 31, 32)
-
Specification