×

Data mining platform for bioinformatics and other knowledge discovery

  • US 7,542,947 B2
  • Filed: 10/30/2007
  • Issued: 06/02/2009
  • Est. Priority Date: 05/01/1998
  • Status: Expired due to Fees
First Claim
Patent Images

1. A data mining platform for generating an output comprising knowledge from analysis of a plurality of biological data sets, wherein the data sets include heterogeneous data types or the data sets come from heterogeneous data sources, the platform comprising:

  • a computer system programmed to implement a plurality of modules stored within a system memory, each module configured for processing one data type of the plurality of heterogeneous data types, each module comprising an input data source, a data analysis engine, a data output and a web server connection for each of the input data source, the data analysis engine and the data output, wherein the computer system comprises at least one processor for executing as part of the data analysis engine of each module one or more support vector machines for generating a plurality of classes of data and at least one margin between classes;

    a web server connected to the web server connection of each module for communicating with each of the input data source, the data analysis engine and the data output of the corresponding module and for providing means for monitoring one or more of the input data source, the data analysis engine, and the data output;

    a combined data analysis engine in communication with the web server for combining the data output from the plurality of modules to generate a single output representing results obtained from analyzing the plurality of heterogeneous data types; and

    a graphical user interface for receiving the results and generating at a printer or display device a report of organized results;

    wherein the at least one processor executes multiple iterations of a feature subset ranking algorithm on a plurality of data sets comprising one or more of sub-samples of the same data set, multiple data sets of heterogeneous data types, and heterogeneous data sources, to produce ranked lists of feature subsets;

    wherein the heterogeneous data types comprise one or more data types selected from the group consisting of gene expression data, 2-D gel data, mass spectrometry data, antibody screening data, clinical observations, clinical history, physical and chemical measurements, genomic determinations, proteomic determinations, drug levels, hormonal and immunological tests, neurochemical or neurophysical measurements, mineral and vitamin level determinations, and genetic and familial histories, and wherein the heterogeneous data sources comprises one or more data sources selected from the group consisting of sensor instruments for collection of genomic data, sensor instruments for collection of proteomic data, sensor instruments for collection of physical and chemical measurements, clinical record databases, general internet search engines, on-line genetic databases, on-line proteomic databases, and on-line journals; and

    wherein the ranked lists of feature subsets comprise lists of genes or proteins.

View all claims
  • 3 Assignments
Timeline View
Assignment View
    ×
    ×