Data mining platform for knowledge discovery from heterogeneous data types and/or heterogeneous data sources
First Claim
1. A data mining platform for generating an output comprising knowledge discovered from analysis of a plurality of data sets comprising heterogeneous data types or data from heterogeneous data sources, wherein the data points within the data sets comprise a plurality of descriptive features of varied relevance to knowledge discovery, the platform comprising:
- a computer system programmed to implement a plurality of modules stored within a system memory, each module adapted for processing one data type of the plurality of heterogeneous data types, each module comprising;
(i) an input data source;
(ii) a data analysis engine;
(iii) a data output; and
(iv) a server connection for the input data source, the data analysis engine and the data output, wherein the data analysis engine comprises at least one processor for executing one or more support vector machines for generating a plurality of classes of data, and one or more feature subset ranking algorithms for ranking feature relevance to knowledge discovery from the plurality of data sets, wherein the at least one processor executes multiple runs of feature subset ranking on a plurality of data sets comprising one or more of sub-samples of the same data set, multiple data sets of heterogeneous data types, and heterogeneous data sources, to produce ranked lists of subsets of features with features having more relevance being ranked higher than features having less relevance, and wherein the at least one processor further validates an analysis obtained with one data type with the analysis obtained with another data type;
a server connected to the server connection for communicating with each of the input data source, the data analysis engine and the data output and for providing means for monitoring one or more of the input data source, the data analysis engine, and the data output;
a combined data analysis engine in communication with the server for combining the data output from the plurality of modules to generate a single output representing knowledge obtained from analyzing the plurality of heterogeneous data types; and
a graphical user interface for receiving the results of the feature subset ranking and generating a display of organized results of the feature subset ranking
3 Assignments
0 Petitions
Accused Products
Abstract
The data mining platform comprises a plurality of system modules, each formed from a plurality of components. Each module has an input data component, a data analysis engine for processing the input data, an output data component for outputting the results of the data analysis, and a web server to access and monitor the other modules within the unit and to provide communication to other units. Each module processes a different type of data, for example, a first module processes microarray (gene expression) data while a second module processes biomedical literature on the Internet for information supporting relationships between genes and diseases and gene functionality. In the preferred embodiment, the data analysis engine is a kernel-based learning machine, and in particular, one or more support vector machines (SVMs). The data analysis engine includes a pre-processing function for feature selection, for reducing the amount of data to be processed by selecting the optimum number of attributes, or “features”, relevant to the information to be discovered.
89 Citations
28 Claims
-
1. A data mining platform for generating an output comprising knowledge discovered from analysis of a plurality of data sets comprising heterogeneous data types or data from heterogeneous data sources, wherein the data points within the data sets comprise a plurality of descriptive features of varied relevance to knowledge discovery, the platform comprising:
-
a computer system programmed to implement a plurality of modules stored within a system memory, each module adapted for processing one data type of the plurality of heterogeneous data types, each module comprising; (i) an input data source; (ii) a data analysis engine; (iii) a data output; and (iv) a server connection for the input data source, the data analysis engine and the data output, wherein the data analysis engine comprises at least one processor for executing one or more support vector machines for generating a plurality of classes of data, and one or more feature subset ranking algorithms for ranking feature relevance to knowledge discovery from the plurality of data sets, wherein the at least one processor executes multiple runs of feature subset ranking on a plurality of data sets comprising one or more of sub-samples of the same data set, multiple data sets of heterogeneous data types, and heterogeneous data sources, to produce ranked lists of subsets of features with features having more relevance being ranked higher than features having less relevance, and wherein the at least one processor further validates an analysis obtained with one data type with the analysis obtained with another data type; a server connected to the server connection for communicating with each of the input data source, the data analysis engine and the data output and for providing means for monitoring one or more of the input data source, the data analysis engine, and the data output; a combined data analysis engine in communication with the server for combining the data output from the plurality of modules to generate a single output representing knowledge obtained from analyzing the plurality of heterogeneous data types; and a graphical user interface for receiving the results of the feature subset ranking and generating a display of organized results of the feature subset ranking - View Dependent Claims (2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 21, 22, 23, 24, 26)
-
-
13. A computer system for generating an output comprising knowledge discovered from analysis of a plurality of data sets comprising heterogeneous data types or data from heterogeneous data sources, wherein the data points within the data sets comprise a plurality of descriptive features of varied relevance to knowledge discovery, the system comprising:
-
a plurality of modules, each module comprising software for processing one data type of the plurality of heterogeneous data types, each module comprising an input data source, a data analysis engine, a data output and a web server connection for each of the input data source, the data analysis engine and the data output, wherein the data analysis engine comprises at least one processor for executing one or more support vector machines for generating a plurality of classes of data, and one or more feature ranking algorithms, wherein the at least one processor executes multiple runs of feature ranking on a plurality of data sets comprising one or more of sub-samples of the same data set, multiple data sets of heterogeneous data types, and heterogeneous data sources, to produce ranked lists of features, wherein the at least one processor further executes an algorithm for organizing results of the feature ranking into a graph or map of features, and wherein the at least one processor executes multiple feature ranking algorithms on multiple data sets including heterogeneous data types or from heterogeneous data sources to produce ranked lists of features and further executes an algorithm to merge the ranked lists of features into a single ranked list of features; a web server connected to the web server connection for communicating with each of the input data source, the data analysis engine and the data output and for providing means for monitoring one or more of the input data source, the data analysis engine, and the data output; a combined data analysis engine in communication with the web server for combining the data output from the plurality of modules to generate a single output representing results obtained from analyzing the plurality of heterogeneous data types; and a graphical user interface for receiving the results of the feature ranking and generating a display of organized results of the feature ranking. - View Dependent Claims (14, 15, 16, 17, 18, 19, 20, 25, 27, 28)
-
Specification