System for discovering implicit relationships in data and a method of using the same
First Claim
1. A data processing and analysis system for discovering implicit relationships in data, comprising:
- means for inputting empirical data, expert domain knowledge, domain conditions, and sample data;
a computing means for receiving the empirical data and expert domain knowledge from the means for inputting empirical data, the computing means having a memory means connected to a processing means, wherein the processing means stores the empirical data and expert domain knowledge in the memory means, pre-processes the empirical data and expert domain knowledge, selects and extracts features from the empirical data and expert-domain knowledge, generates correlation matrices, derives conditional probability tables, calculates posterior probabilities, creates a domain model, incorporating any user-defined domain conditions, for storage in the memory means, provides an output signal representative of the domain, and provides an output signal representing a classification of the sample data; and
means for receiving the output signal representative of the domain and the output signal representing the classification of the sample data, for graphically displaying the representation of the domain, and for displaying the classification of the sample data.
0 Assignments
0 Petitions
Accused Products
Abstract
A data processing and analysis system, and a method of using the same, for discovering implicit relationships in data. The method is executed in a computer system capable of receiving input data comprised of expert knowledge, empirical data, and user-defined constraints for any application domain. The system and method provide any pre-processing the input data may require, and perform feature selection and extraction on the input data. Further, the system and method generate a graphical representation of the implicit relationships in the input data, indicating relationships between both class variables and feature variables. Also generated is a classifier that provides a semantic and statistical justification of its classification results which further provides: statistical relevancy of the data set, including an indication of the undersampled regions of the data space; a data analysis specific to a desired level of confidence; and a sound decision theoretical foundation for classification thresholding. The system and method generate a classifier capable of classifying a sample with respect to any variable, handle missing data values, and provide a complete data analysis.
-
Citations
3 Claims
-
1. A data processing and analysis system for discovering implicit relationships in data, comprising:
-
means for inputting empirical data, expert domain knowledge, domain conditions, and sample data;
a computing means for receiving the empirical data and expert domain knowledge from the means for inputting empirical data, the computing means having a memory means connected to a processing means, wherein the processing means stores the empirical data and expert domain knowledge in the memory means, pre-processes the empirical data and expert domain knowledge, selects and extracts features from the empirical data and expert-domain knowledge, generates correlation matrices, derives conditional probability tables, calculates posterior probabilities, creates a domain model, incorporating any user-defined domain conditions, for storage in the memory means, provides an output signal representative of the domain, and provides an output signal representing a classification of the sample data; and
means for receiving the output signal representative of the domain and the output signal representing the classification of the sample data, for graphically displaying the representation of the domain, and for displaying the classification of the sample data.
-
-
2. A data processing and analysis method in a computer system for discovering implicit relationships in data using an input means and a display means connected to a computing means having a memory means connected to a processing means, the method comprising the steps of:
-
inputting empirical data, expert domain knowledge, domain conditions, and sample data to the computing means, via the input means;
receiving the empirical data and expert domain knowledge in the computing means;
utilizing the processing means to store the empirical data and expert domain knowledge in the memory means, to pre-process the empirical data and expert domain knowledge, to select and extract features from the empirical data and expert domain knowledge, to generate correlation matrices, to derive conditional probability tables, to calculate posterior probabilities, to create a domain model, incorporating any user-defined domain conditions, for storage in the memory means, to provide an output signal representative of the domain, and to provide an output signal representing a classification of the sample data;
receiving the output signal representative of the domain and the output signal representing the classification of the sample data in the display means;
graphically displaying the representation of the domain on the display means; and
displaying the classification of the sample data on the display means.
-
-
3. A computer program product for use with a computer system for directing the system to discover implicit relationships in data, the computer program product comprising:
-
a computer readable medium;
means, provided on the computer readable medium, for directing the system to receive empirical data, expert domain knowledge, domain conditions, and sample data;
means, provided on the computer readable medium, for storing the empirical data and expert domain knowledge in the computer readable medium;
means, provided on the computer readable medium, for pre-processing the empirical data and expert domain knowledge;
means, provided on the computer readable medium, for selecting and extracting features from the empirical data and expert domain knowledge;
means, provided on the computer readable medium, for generating correlation matrices;
means, provided on the computer readable medium, for deriving conditional probability tables;
means, provided on the computer readable medium, for calculating posterior probabilities;
means, provided on the computer readable medium, for creating a domain model, incorporating any user-defined domain conditions, for storage in the computer readable medium;
means, provided on the computer readable medium, for providing an output signal representative of the domain, and an output signal representing a classification of the sample data;
means, provided on the computer readable medium, for receiving the output signal representative of the domain and the output signal representing the classification of the sample data in the display means;
means, provided on the computer readable medium, for graphically displaying the representation of the domain on a display means; and
means, provided on the computer readable medium, for displaying the classification of the sample data on the display means.
-
Specification