Directional expression-based scientific information knowledge management
First Claim
1. A computer-implemented method of integrating expression-based data into a knowledge base stored on one or more storage devices, knowledge base comprising at least one pre-existing bi-directional feature set, each bi-directional feature set comprising a list of a plurality of features and, for at least some of the listed features, up or down expression information relative to a control sample or normal state, the computer-implemented method comprising:
- receiving a bi-directional input feature set comprising a list of a plurality of features and, for at least some of the listed features, up or down regulation expression information of the feature relative to a control sample or normal state; and
automatically correlating by one or more processors of a computer system the input feature set with a plurality or all other pre-existing bi-directional feature sets;
wherein automatically correlating the input feature set with a bi-directional pre-existing feature set comprises determining by the one or more processors multiple individual correlation scores and, from the multiple individual correlation scores, determining by the one or more processors an overall correlation score and a correlation direction,wherein an up regulation expression information indicates a positive correlation score between the input feature set and the pre-existing feature set, a down regulation expression information indicates a negative correlation score between the input feature set and the pre-existing feature set, andwherein the input feature set and the pre-existing feature sets are selected from the group consisting of differential expression of said features, responses of said features to a treatment or stimulus, and effects of said features on biological systems.
2 Assignments
0 Petitions
Accused Products
Abstract
The present invention relates to methods, systems and apparatus for capturing, integrating, organizing, navigating and querying large-scale data from high-throughput biological and chemical assay platforms. It provides a highly efficient meta-analysis infrastructure for performing research queries across a large number of studies and experiments from different biological and chemical assays, data types and organisms, as well as systems to build and add to such an infrastructure. In particular, aspects of the invention relate to integrating, organizing, navigating and querying “directional” data, such as gene expression profiles.
-
Citations
18 Claims
-
1. A computer-implemented method of integrating expression-based data into a knowledge base stored on one or more storage devices, knowledge base comprising at least one pre-existing bi-directional feature set, each bi-directional feature set comprising a list of a plurality of features and, for at least some of the listed features, up or down expression information relative to a control sample or normal state, the computer-implemented method comprising:
-
receiving a bi-directional input feature set comprising a list of a plurality of features and, for at least some of the listed features, up or down regulation expression information of the feature relative to a control sample or normal state; and automatically correlating by one or more processors of a computer system the input feature set with a plurality or all other pre-existing bi-directional feature sets;
wherein automatically correlating the input feature set with a bi-directional pre-existing feature set comprises determining by the one or more processors multiple individual correlation scores and, from the multiple individual correlation scores, determining by the one or more processors an overall correlation score and a correlation direction,wherein an up regulation expression information indicates a positive correlation score between the input feature set and the pre-existing feature set, a down regulation expression information indicates a negative correlation score between the input feature set and the pre-existing feature set, and wherein the input feature set and the pre-existing feature sets are selected from the group consisting of differential expression of said features, responses of said features to a treatment or stimulus, and effects of said features on biological systems. - View Dependent Claims (2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 17)
-
-
14. A computer implemented method of conducting a query in a knowledge base of chemical and/or biological information comprising a plurality of feature sets, each feature set comprising a list of a plurality of chemical or biological features and associated statistical information, the method comprising:
-
receiving a query identifying at least one feature set comprising up and down gene regulation expression information, wherein the query is received from a user input to a computer system; using precomputed correlation scores between the at least one identified feature set and other content in the knowledge base to determine feature set rankings in reply to said query; and presenting the user with a ranked list of feature sets as determined by the precomputed correlation scores, and, for at least some of the feature sets in the ranked list, an indication of whether the correlation of that feature set with the identified feature set is positive or negative; and wherein an UP gene regulation expression information indicates a positive correlation score between the identified feature set and the feature set in the knowledge base, a down gene regulation expression information indicates a negative correlation score between the identified feature set and the feature set in the knowledge base, and wherein the identified feature set and the feature set in the knowledge base are selected from the group consisting of differential expression of said features, responses of said features to a treatment or stimulus, and effects of said features on biological systems. - View Dependent Claims (15, 18)
-
-
16. A method of providing data to a knowledge base of scientific information, the method comprising:
-
(a) receiving raw data from one or more samples, wherein the raw data includes information on one or more features with indications of the magnitude and direction of change of those features, relative to a normal state or control sample, in response to a treatment or stimulus; (b) producing an input feature set from the raw data by removing or reorganizing information about at least some less relevant features; (c) correlating by one or more processors of a computer system the input feature set against a plurality or all of the pre-existing feature sets in the knowledge base; (d) correlating by one or more processors of a computer system the input feature set against one or more feature groups in the knowledge base, wherein the feature groups provide collections of features having structural and/or functional characteristics in common; and (e) storing on one or more storage devices correlation information generated in (c) and (d) for use in responding to queries involving feature groups or feature sets, wherein the features are selected from the group consisting of biological entities, chemical entities, biological information and chemical information, and wherein the input feature set and the pre-existing feature sets are selected from the group consisting of differential expression of said features, responses of said features to a treatment or stimulus, and effects of said features on biological systems.
-
Specification