CATEGORIZATION AND FILTERING OF SCIENTIFIC DATA
First Claim
1. A computer-implemented method of correlating chemical and/or biological concepts with other information in a knowledge base, said knowledge base comprising 1) a taxonomy of biological and/or chemical concepts arranged in a hierarchical structure comprising at least one top-level category, 2) a plurality of feature sets and/or feature groups, each feature set comprising at least one feature of chemical or biological information and associated statistical information and each feature group comprising a list of related features wherein at least some of said feature sets and feature groups are associated with one or more concepts in the taxonomy, said method comprising:
- for each of a plurality or all of the concepts in the taxonomy, identifying feature sets that contribute to scoring the concept under consideration by identifying all feature sets associated with the concept under consideration and/or its child concepts;
receiving pre-computed correlation scores and/or rank scores between the contributing feature sets and other information in the knowledge base; and
calculating a score indicating correlation between the concept under consideration and other information in the knowledge base based on the precomputed correlations and/or rank scores.
2 Assignments
0 Petitions
Accused Products
Abstract
The present invention relates to methods, systems and apparatus for capturing, integrating, organizing, navigating and querying large-scale data from high-throughput biological and chemical assay platforms. It provides a highly efficient meta-analysis infrastructure for performing research queries across a large number of studies and experiments from different biological and chemical assays, data types and organisms, as well as systems to build and add to such an infrastructure. According to various embodiments, methods, systems and interfaces for associating experimental data, features and groups of data related by structure and/or function with chemical, medical and/or biological terms in an ontology or taxonomy are provided. According to various embodiments, methods, systems and interfaces for filtering data by data source information are provided, allowing dynamic navigation through large amounts of data to find the most relevant results for a particular query.
-
Citations
36 Claims
-
1. A computer-implemented method of correlating chemical and/or biological concepts with other information in a knowledge base, said knowledge base comprising 1) a taxonomy of biological and/or chemical concepts arranged in a hierarchical structure comprising at least one top-level category, 2) a plurality of feature sets and/or feature groups, each feature set comprising at least one feature of chemical or biological information and associated statistical information and each feature group comprising a list of related features wherein at least some of said feature sets and feature groups are associated with one or more concepts in the taxonomy, said method comprising:
-
for each of a plurality or all of the concepts in the taxonomy, identifying feature sets that contribute to scoring the concept under consideration by identifying all feature sets associated with the concept under consideration and/or its child concepts; receiving pre-computed correlation scores and/or rank scores between the contributing feature sets and other information in the knowledge base; and calculating a score indicating correlation between the concept under consideration and other information in the knowledge base based on the precomputed correlations and/or rank scores. - View Dependent Claims (2, 3, 4, 5, 6, 7, 8, 9)
-
-
10. A computer program product comprising a machine readable medium on which is provided program instructions for correlating chemical and/or biological concepts with other information in a knowledge base, said knowledge base comprising 1) a taxonomy of biological and/or chemical concepts arranged in a hierarchical structure comprising at least one top-level category, 2) a plurality of feature sets and/or feature groups, each feature set comprising at least one feature of chemical or biological information and associated statistical information and each feature group comprising a list of related features wherein at least some of said feature sets and feature groups are associated with one or more concepts in the taxonomy, said program instructions comprising:
-
code for, for each of a plurality or all of the concepts in the taxonomy, identifying feature sets that contribute to scoring the concept under consideration by identifying all feature sets associated with the concept under consideration and/or its child concepts; code for receiving pre-computed correlation scores and/or rank scores between the contributing feature sets and other information in the knowledge base; and code for calculating a score indicating correlation between the concept under consideration and other information in the knowledge base based on the precomputed correlations and/or rank scores.
-
-
11. A system for correlating chemical and/or biological concepts with other information in a knowledge base, said knowledge base comprising 1) a taxonomy of biological and/or chemical concepts arranged in a hierarchical structure comprising at least one top-level category, 2) a plurality of feature sets and/or feature groups, each feature set comprising at least one feature of chemical or biological information and associated statistical information and each feature group comprising a list of related features wherein at least some of said feature sets and feature groups are associated with one or more concepts in the taxonomy, said system comprising:
-
a memory for storing the knowledge base; and one or more processors in communication with the memory for receiving pre-computed correlation scores and/or rank scores between the contributing feature sets and other information in the knowledge base and calculating a score indicating correlation between the concept under consideration and other information in the knowledge base based on the precomputed correlations and/or rank scores.
-
-
12. A computer-implemented method of correlating chemical and/or biological concepts in a knowledge base, said knowledge base comprising 1) a taxonomy of biological and/or chemical concepts arranged in a hierarchical structure comprising at least one top-level category, 2) a plurality of feature sets each comprising at least one feature of chemical or biological information and associated statistical information wherein at least some of said feature sets and feature groups are associated with one or more concepts in the taxonomy, said method comprising:
-
for each of a plurality or all of the concepts in the taxonomy, identifying feature sets that contribute to scoring the concept under consideration by identifying all feature sets associated with the concept under consideration and/or its child concepts; and for all or a plurality of unique concept pairs in the knowledge base, receiving correlation scores indicating the pair-wise correlations of the feature sets that contribute to the concept under consideration with the feature sets of at least one or other concept in the knowledge base and calculating a score indicating the correlation between the concepts in the pair based on the pair-wise correlation scores.
-
-
13. A computer-implemented method of conducting a query in a knowledge base of chemical and/or biological information and comprising a plurality of feature sets and/or feature groups and a taxonomy, each feature set comprising at least one feature of chemical or biological information and associated statistical information, each feature group comprising a list of related features, and the taxonomy comprising chemical and/or biological concepts arranged in a hierarchical structure having at least one top level category;
- the method comprising;
receiving a query identifying one or more of said feature sets or feature groups, wherein the query is received from a user input to a computer system; using precomputed scores between the one or more feature sets or feature groups and concepts in the taxonomy to determine the most relevant concepts in response to said query; and presenting the user with a ranked list of concepts as determined by using the precomputed scores. - View Dependent Claims (14, 15)
- the method comprising;
-
16. A computer program product comprising a machine readable medium on which is provided program instructions for conducting a query in a knowledge base of chemical and/or biological information and comprising a plurality of feature sets and/or feature groups and a taxonomy, each feature set comprising at least one feature of chemical or biological information and associated statistical information, each feature group comprising a list of related features, and the taxonomy comprising chemical and/or biological concepts arranged in a hierarchical structure having at least one top level category, said program instructions comprising:
-
code for receiving a query identifying one or more of said feature sets or feature groups, wherein the query is received from a user input to a computer system; code for using precomputed scores between the one or more feature sets or feature groups and concepts in the taxonomy to determine the most relevant concepts in response to said query; and code for presenting the user with a ranked list of concepts as determined by using the precomputed scores.
-
-
17. A system for conducting a query in a knowledge base of chemical and/or biological information and comprising a plurality of feature sets and/or feature groups and a taxonomy, each feature set comprising at least one feature of chemical or biological information and associated statistical information, each feature group comprising a list of related features, and the taxonomy comprising chemical and/or biological concepts arranged in a hierarchical structure having at least one top level category, said system comprising:
-
a memory for storing the knowledge base; an interface designed or configured to receive a query identifying one or more of said feature sets or feature groups, one or more processors in communication with the memory designed or configured for using precomputed scores between the one or more identified feature sets or feature groups and concepts in the taxonomy to determine the most relevant concepts in response to said query; and an interface configured to present a user with a ranked list of concepts as determined by using the precomputed scores.
-
-
18. A computer-implemented method of conducting a query in a knowledge base of chemical and/or biological information and comprising a plurality of feature sets and a taxonomy, each feature set comprising at least one feature of chemical or biological information and associated statistical information and the taxonomy comprising chemical and/or biological concepts arranged in a hierarchical structure having at least one top level category;
- the method comprising;
receiving a query identifying one or more of said features, wherein the query is received from a user input to a computer system; using normalized ranks of features in feature sets associated with concepts in the taxonomy to determine the most relevant concepts in response to said query; and presenting the user with a ranked list of concepts as determined by using the normalized ranks.
- the method comprising;
- 19. The computer-implemented method of claim 19 wherein the at least one top level category comprises at least one of the group consisting of tissues or organs, diseases, and treatments.
-
21. A computer program product comprising a machine readable medium on which is provided program instructions for conducting a query in a knowledge base of chemical and/or biological information and comprising a plurality of feature sets and a taxonomy, each feature set comprising at least one feature of chemical or biological information and associated statistical information and the taxonomy comprising chemical and/or biological concepts arranged in a hierarchical structure having at least one top level category;
- the program instructions comprising;
code for receiving a query identifying one or more of said features, wherein the query is received from a user input to a computer system; code for using normalized ranks of features in feature sets associated with concepts in the taxonomy to determine the most relevant concepts in response to said query; and code for presenting the user with a ranked list of concepts as determined by using the normalized ranks.
- the program instructions comprising;
-
22. A system for conducting a query in a knowledge base of chemical and/or biological information and comprising a plurality of feature sets and a taxonomy, each feature set comprising at least one feature of chemical or biological information and associated statistical information and the taxonomy comprising chemical and/or biological concepts arranged in a hierarchical structure having at least one top level category;
- the program instructions comprising, said system comprising;
a memory for storing the knowledge base; and one or more processors in communication with the memory and designed or configured for receiving a query identifying one or more of said features, wherein the query is received from a user input to the system and for using normalized ranks of features in feature sets associated with concepts in the taxonomy to determine the most relevant concepts in response to said query; and an interface configured to present the user with a ranked list of concepts as determined by using the normalized ranks.
- the program instructions comprising, said system comprising;
-
23. A computer-implemented method of conducting a query in a knowledge base of chemical and/or biological information and comprising a plurality of feature sets and a taxonomy, each feature set comprising at least one feature of chemical or biological information and associated statistical information and the taxonomy comprising chemical and/or biological concepts arranged in a hierarchical structure having at least one top level category;
- the method comprising;
receiving a query identifying one or more of said concepts, wherein the query is received from a user input to a computer system; using pre-computed scores indicating the correlation between concepts in the taxonomy to determine the most relevant concepts in response to said query; and presenting the user with a ranked list of concepts as determined by using the pre-computed scores.
- the method comprising;
-
24. A computer program product comprising a machine readable medium on which is provided program instructions for conducting a query in a knowledge base of chemical and/or biological information and comprising a plurality of feature sets and a taxonomy, each feature set comprising at least one feature of chemical or biological information and associated statistical information and the taxonomy comprising chemical and/or biological concepts arranged in a hierarchical structure having at least one top level category;
- said program instructions comprising;
code for receiving a query identifying one or more of said concepts, wherein the query is received from a user input to a computer system; code for using pre-computed scores indicating the correlation between concepts in the taxonomy to determine the most relevant concepts in response to said query; and code for presenting the user with a ranked list of concepts as determined by using the pre-computed scores.
- said program instructions comprising;
-
25. A system for conducting a query in a knowledge base of chemical and/or biological information and comprising a plurality of feature sets and a taxonomy, each feature set comprising at least one feature of chemical or biological information and associated statistical information and the taxonomy comprising chemical and/or biological concepts arranged in a hierarchical structure having at least one top level category;
- the program instructions comprising, said system comprising;
a memory for storing the knowledge base; one or more processors in communication with the memory and designed or configured receiving a query identifying one or more of said concepts, wherein the query is received from a user input to the system and for using pre-computed scores indicating the correlation between concepts in the taxonomy to determine the most relevant concepts in response to said query; an interface configured to present the user with a ranked list of concepts as determined by using the pre-computed scores.
- the program instructions comprising, said system comprising;
-
26. A computer-implemented method of correlating chemical and/or biological concepts with other information in a knowledge base, said knowledge base comprising 1) a taxonomy of biological and/or chemical concepts arranged in a hierarchical structure comprising at least one top-level category, 2) a plurality of feature sets and/or feature groups, each feature set comprising at least one feature of chemical or biological information and associated statistical information and each feature group comprising a list of related features wherein at least some of said feature sets and feature groups are associated with one or more concepts in the taxonomy, said method comprising:
-
for each of a plurality or all of the concepts in the taxonomy, identifying feature sets that contribute to scoring the concept under consideration by identifying all feature sets associated with the concept under consideration and/or its child concepts; receiving pre-computed correlation scores and/or rank scores between the contributing feature sets and other information in the knowledge base; and calculating a score indicating correlation between the concept under consideration and other information in the knowledge base based on the precomputed correlations and/or rank scores.
-
-
27. A knowledge base for storing, managing, organizing and querying data comprising scientific experiment information, said knowledge base comprising:
-
a plurality of feature sets, each feature set comprising at least one feature and associated statistical information; a taxonomy comprising a list of tags arranged in a hierarchical structure; and a concept scoring table comprising information about the correlation between at least some of the tags in the taxonomy and at least some the feature and/or feature sets. - View Dependent Claims (28, 29, 30, 31)
-
-
32. A computer-implemented method of conducting a query in a knowledge base of chemical and/or biological information and comprising a plurality of feature sets and/or feature groups, each feature set comprising at least one feature of chemical or biological information and associated statistical information as derived from chemical or biological experimental data and each feature group comprising a list of related features, the method comprising:
-
receiving a query identifying one or more of said feature sets or feature groups, wherein the query is received from a user input to a computer system; determining correlations between the identified feature sets or feature groups and other content in the knowledge base; presenting the user with results comprising a ranked list of feature sets, wherein a ranking of a resulting feature set indicates the degree of correlation to the identified feature sets or feature groups; and presenting the user with an indication of data sources of the experimental data associated with the resulting feature sets. - View Dependent Claims (33)
-
-
34. A computer-implemented method of providing data to a knowledge base of scientific information, said knowledge base comprising 1) a taxonomy of biological and/or chemical concepts arranged in a hierarchical structure comprising at least one top-level category, 2) a plurality of feature sets and/or feature groups, each feature set comprising at least one feature of chemical or biological information and associated statistical information and each feature group comprising a list of related features wherein at least some of said feature sets and feature groups are associated with one or more concepts in the taxonomy, the method comprising:
-
(a) receiving raw data from one or more samples, wherein the raw data includes information on one or more features with indications of one or more of;
differential expression, abundance of said features, responses of said features to a treatment or stimulus, and effects of said features on biological systems;(b) producing an input feature set from the raw data; (c) correlating the input feature set against a plurality or all of the pre-existing feature sets in the knowledge base; (d) correlating the input feature set against a plurality or all of the concepts in the taxonomy; and (e) assigning a data authority level to a given concept-input feature set combination based on corroboration within the knowledge base that the given concept is significant to the experimental data represented by the input feature set.
-
-
35. A computer program product comprising a machine readable medium on which is provided program instructions for providing data to a knowledge base of scientific information, said knowledge base comprising 1) a taxonomy of biological and/or chemical concepts arranged in a hierarchical structure comprising at least one top-level category, 2) a plurality of feature sets and/or feature groups, each feature set comprising at least one feature of chemical or biological information and associated statistical information and each feature group comprising a list of related features wherein at least some of said feature sets and feature groups are associated with one or more concepts in the taxonomy, the program instructions comprising:
-
(a) code for producing an input feature set from the raw data from one or more samples, wherein the raw data includes information on one or more features with indications of one or more of;
differential expression, abundance of said features, responses of said features to a treatment or stimulus, and effects of said features on biological systems;(b) code for correlating the input feature set against a plurality or all of the pre-existing feature sets in the knowledge base; (c) code for correlating the input feature set against a plurality or all of the concepts in the taxonomy; and (d) code for assigning a data authority level to a given concept-input feature set combination based on corroboration within the knowledge base that the given concept is significant to the experimental data represented by the input feature set.
-
-
36. A system for providing data to a knowledge base of scientific information, said knowledge base comprising 1) a taxonomy of biological and/or chemical concepts arranged in a hierarchical structure comprising at least one top-level category, 2) a plurality of feature sets and/or feature groups, each feature set comprising at least one feature of chemical or biological information and associated statistical information and each feature group comprising a list of related features wherein at least some of said feature sets and feature groups are associated with one or more concepts in the taxonomy, said system comprising:
-
(a) a memory to store the knowledge base; and (a) one or more processors in communication with the memory and configured for receiving raw data from one or more samples, wherein the raw data includes information on one or more features with indications of one or more of;
differential expression, abundance of said features, responses of said features to a treatment or stimulus, and effects of said features on biological systems;
producing an input feature set from the raw data;
correlating the input feature set against a plurality or all of the pre-existing feature sets in the knowledge base;
correlating the input feature set against a plurality or all of the concepts in the taxonomy; andassigning a data authority level to a given concept-input feature set combination based on corroboration within the knowledge base that the given concept is significant to the experimental data represented by the input feature set.
-
Specification