×

Data mining platform for bioinformatics and other knowledge discovery

  • US 7,444,308 B2
  • Filed: 06/17/2002
  • Issued: 10/28/2008
  • Est. Priority Date: 06/15/2001
  • Status: Expired due to Term
First Claim
Patent Images

1. A data mining platform for generating an output comprising knowledge from analysis of a plurality of biological data sets, wherein the data sets include heterogeneous data types or the data sets come from heterogeneous data sources, the platform comprising:

  • a computer system programmed to implement a plurality of modules stored within a system memory, each module adapted for processing one data type of the plurality of heterogeneous data types, wherein at least one of the heterogeneous data types comprises gene expression data and gene expression coefficients for a plurality of genes, each module comprising an input data source, a data analysis engine, a data output and a server connection for each of the input data source, the data analysis engine and the data output, wherein the data analysis engine comprises at least one processor for executing one or more support vector machines for generating a plurality of classes of data and at least one margin between classes, and one or more feature subset ranking algorithms, wherein the at least one processor executes multiple runs of feature subset ranking on a plurality of data sets comprising one or more of sub-samples of the same data set, multiple data sets of heterogeneous data types, and heterogeneous data sources, to produce ranked lists of subsets of genes, wherein the at least one processor further executes an algorithm for organizing results of the feature subset ranking into a graph or map of genes and an algorithm to merge into a single graph a structure of features previously obtained, including ranked lists of subsets of features, ranked lists of features, or trees of features;

    a server connected to the server connection for communicating with each of the input data source, the data analysis engine and the data output and for providing means for monitoring one or more of the input data source, the data analysis engine, and the data output;

    a combined data analysis engine in communication with the server for combining the data output from the plurality of modules to generate a single output representing knowledge obtained from analyzing the plurality of heterogeneous data types; and

    a graphical user interface for receiving the results of the feature subset ranking and generating a display of organized results of feature subset ranking to enable visualization of gene ranking for the plurality of genes.

View all claims
  • 5 Assignments
Timeline View
Assignment View
    ×
    ×