System and method for scientific information knowledge management
First Claim
Patent Images
1. A method, implemented using one or more computers comprising one or more processors and system memory, of integrating data in a database of scientific information, the method comprising:
- (a) receiving, by the one or more processors, an input feature set, said input feature set comprising a data structure comprising a table comprising (i) a list of input features and (ii) a list of associated statistical information, wherein the features comprise genes, SNPs, SNP patterns, portions of genes, regions of a genome, proteins, compounds, metabolites, or phenotypes;
(b) receiving, by the one or more processors, an index set comprising (i) a plurality of feature identifiers representing a plurality of features, and (ii) a plurality of globally unique mapping identifiers,whereineach feature identifier points to one or more globally unique mapping identifiers,two or more feature identifiers of the plurality of feature identifiers point to a same globally unique mapping identifier, the two or more feature identifiers are related to each other by at least one of;
nomenclature-based, sequence-based, activity-based, regulatory-based, function-based, or structure-based relationships, andeach globally unique mapping identifier has a unique address in the index set;
(c) automatically mapping, by the one or more processors, the input features in the input feature set to a subset of feature identifiers in the index set, wherein the subset of feature identifiers represents the input features and points to a subset of globally unique mapping identifiers in the index set, thereby providing first mapping information between the input features and the subset of globally unique mapping identifiers;
(d) providing, by the one or more processors, second mapping information between at least some pre-existing features of a plurality of pre-existing feature sets in the database and at least some of the subset of globally unique mapping identifiers, wherein the input feature set and the plurality of pre-existing feature sets are obtained from different experiments, platforms, or organisms;
(e) generating, by the one or more processors, an alignment scheme between the input feature set and the plurality of pre-existing feature sets in the database using the first mapping information and the second mapping information;
(f) automatically correlating, by the one or more processors, the input feature set with the plurality of pre-existing feature sets in the database using the alignment scheme; and
(g) automatically storing, by the one or more processors, the correlation information in (f) on a non-transitory machine readable medium for use in responding to queries involving feature sets.
2 Assignments
0 Petitions
Accused Products
Abstract
The present invention relates to methods, systems and apparatus for capturing, integrating, organizing, navigating and querying large-scale data from high-throughput biological and chemical assay platforms. It provides a highly efficient meta-analysis infrastructure for performing research queries across a large number of studies and experiments from different biological and chemical assays, data types and organisms, as well as systems to build and add to such an infrastructure.
-
Citations
21 Claims
-
1. A method, implemented using one or more computers comprising one or more processors and system memory, of integrating data in a database of scientific information, the method comprising:
-
(a) receiving, by the one or more processors, an input feature set, said input feature set comprising a data structure comprising a table comprising (i) a list of input features and (ii) a list of associated statistical information, wherein the features comprise genes, SNPs, SNP patterns, portions of genes, regions of a genome, proteins, compounds, metabolites, or phenotypes; (b) receiving, by the one or more processors, an index set comprising (i) a plurality of feature identifiers representing a plurality of features, and (ii) a plurality of globally unique mapping identifiers, wherein each feature identifier points to one or more globally unique mapping identifiers, two or more feature identifiers of the plurality of feature identifiers point to a same globally unique mapping identifier, the two or more feature identifiers are related to each other by at least one of;
nomenclature-based, sequence-based, activity-based, regulatory-based, function-based, or structure-based relationships, andeach globally unique mapping identifier has a unique address in the index set; (c) automatically mapping, by the one or more processors, the input features in the input feature set to a subset of feature identifiers in the index set, wherein the subset of feature identifiers represents the input features and points to a subset of globally unique mapping identifiers in the index set, thereby providing first mapping information between the input features and the subset of globally unique mapping identifiers; (d) providing, by the one or more processors, second mapping information between at least some pre-existing features of a plurality of pre-existing feature sets in the database and at least some of the subset of globally unique mapping identifiers, wherein the input feature set and the plurality of pre-existing feature sets are obtained from different experiments, platforms, or organisms; (e) generating, by the one or more processors, an alignment scheme between the input feature set and the plurality of pre-existing feature sets in the database using the first mapping information and the second mapping information; (f) automatically correlating, by the one or more processors, the input feature set with the plurality of pre-existing feature sets in the database using the alignment scheme; and (g) automatically storing, by the one or more processors, the correlation information in (f) on a non-transitory machine readable medium for use in responding to queries involving feature sets. - View Dependent Claims (2, 3, 4, 5, 6, 7)
-
-
8. A method, implemented using one or more computers comprising one or more processors and system memory, of conducting a query in a database comprising a plurality of feature sets or feature groups, each feature set comprising a plurality of features and associated statistical information and each feature group comprising a list of related features, the features being genes, SNPs, SNP patterns, portions of genes, regions of a genome, proteins, compounds, metabolites, or phenotypes, the method comprising:
-
receiving, by one or more processors of the one or more computers, a query identifying one or more feature sets or feature groups in the plurality of feature sets or feature groups, wherein the query is received from a user input to a computer system, and wherein the plurality of feature sets or feature groups was obtained from different experiments, studies, platforms, or organisms; retrieving, by one or more processors of the one or more computers, precomputed correlation scores between the one or more feature sets or feature groups and other feature sets or feature groups in the database, wherein the precomputed correlation scores were computed by; (a) receiving, by one or more processors of the one or more computers, an input feature set comprising a data structure comprising a table comprising (i) a list of input features and (ii) a list of associated statistical information; (b) receiving, by one or more processors of the one or more computers, an index set comprising (i) a plurality of feature identifiers representing a plurality of features, and (ii) a plurality of globally unique mapping identifiers, wherein each feature identifier points to one or more globally unique mapping identifiers, two or more feature identifiers of the plurality of feature identifiers point to a same globally unique mapping identifier, the two or more feature identifiers are related to each other by at least one of;
nomenclature-based, sequence-based, activity-based, regulatory-based, function-based, or structure-based relationships, andeach globally unique mapping identifier has a unique address in the index set; (c) automatically mapping, by one or more processors of the one or more computers, the input features in the input feature set to a subset of feature identifiers in the index set, wherein the subset of feature identifiers represents the input features and points to a subset of globally unique mapping identifiers in the index set, thereby providing first mapping information between the input features and the subset of globally unique mapping identifiers; (d) providing, by one or more processors of the one or more computers, second mapping information between at least some pre-existing features of a plurality of pre-existing feature sets or feature groups in the database and at least some of the subset of globally unique mapping identifiers, wherein the plurality of pre-existing feature sets or feature groups comprises the other feature sets or feature groups in the database; (e) generating, by one or more processors of the one or more computers, an alignment scheme between the input feature set and the plurality of pre-existing feature sets or feature groups in the database using the first mapping information and the second mapping information; and (f) correlating, by one or more processors of the one or more computers, the input feature set with the plurality of pre-existing feature sets or feature groups in the database using the alignment scheme to generate the precomputed correlation scores between the one or more feature sets or feature groups and the other feature sets or feature groups in the database; ranking, by one or more processors of the one or more computers, features, feature sets or feature groups using the precomputed correlation scores between the one or more feature sets or feature groups and the other feature sets or feature groups in the database; and outputting on a display device a ranked list of the features, feature sets or feature groups as determined by the ranking using the precomputed correlation scores. - View Dependent Claims (9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21)
-
Specification