System and method for scientific information knowledge management

US 10,275,711 B2
Filed: 08/17/2012
Issued: 04/30/2019
Est. Priority Date: 12/16/2005
Status: Active Grant

First Claim

Patent Images

1. A method, implemented using one or more computers comprising one or more processors and system memory, of integrating data in a database of scientific information, the method comprising:

(a) receiving, by the one or more processors, an input feature set, said input feature set comprising a data structure comprising a table comprising (i) a list of input features and (ii) a list of associated statistical information, wherein the features comprise genes, SNPs, SNP patterns, portions of genes, regions of a genome, proteins, compounds, metabolites, or phenotypes;

(b) receiving, by the one or more processors, an index set comprising (i) a plurality of feature identifiers representing a plurality of features, and (ii) a plurality of globally unique mapping identifiers,whereineach feature identifier points to one or more globally unique mapping identifiers,two or more feature identifiers of the plurality of feature identifiers point to a same globally unique mapping identifier, the two or more feature identifiers are related to each other by at least one of;

nomenclature-based, sequence-based, activity-based, regulatory-based, function-based, or structure-based relationships, andeach globally unique mapping identifier has a unique address in the index set;

(c) automatically mapping, by the one or more processors, the input features in the input feature set to a subset of feature identifiers in the index set, wherein the subset of feature identifiers represents the input features and points to a subset of globally unique mapping identifiers in the index set, thereby providing first mapping information between the input features and the subset of globally unique mapping identifiers;

(d) providing, by the one or more processors, second mapping information between at least some pre-existing features of a plurality of pre-existing feature sets in the database and at least some of the subset of globally unique mapping identifiers, wherein the input feature set and the plurality of pre-existing feature sets are obtained from different experiments, platforms, or organisms;

(e) generating, by the one or more processors, an alignment scheme between the input feature set and the plurality of pre-existing feature sets in the database using the first mapping information and the second mapping information;

(f) automatically correlating, by the one or more processors, the input feature set with the plurality of pre-existing feature sets in the database using the alignment scheme; and

(g) automatically storing, by the one or more processors, the correlation information in (f) on a non-transitory machine readable medium for use in responding to queries involving feature sets.

View all claims

2 Assignments

Timeline View

Assignment View

0 Petitions

Accused Products

Abstract

The present invention relates to methods, systems and apparatus for capturing, integrating, organizing, navigating and querying large-scale data from high-throughput biological and chemical assay platforms. It provides a highly efficient meta-analysis infrastructure for performing research queries across a large number of studies and experiments from different biological and chemical assays, data types and organisms, as well as systems to build and add to such an infrastructure.

Citations

21 Claims

1. A method, implemented using one or more computers comprising one or more processors and system memory, of integrating data in a database of scientific information, the method comprising:
- (a) receiving, by the one or more processors, an input feature set, said input feature set comprising a data structure comprising a table comprising (i) a list of input features and (ii) a list of associated statistical information, wherein the features comprise genes, SNPs, SNP patterns, portions of genes, regions of a genome, proteins, compounds, metabolites, or phenotypes;
  
  (b) receiving, by the one or more processors, an index set comprising (i) a plurality of feature identifiers representing a plurality of features, and (ii) a plurality of globally unique mapping identifiers,whereineach feature identifier points to one or more globally unique mapping identifiers,two or more feature identifiers of the plurality of feature identifiers point to a same globally unique mapping identifier, the two or more feature identifiers are related to each other by at least one of;
  
  nomenclature-based, sequence-based, activity-based, regulatory-based, function-based, or structure-based relationships, andeach globally unique mapping identifier has a unique address in the index set;
  
  (c) automatically mapping, by the one or more processors, the input features in the input feature set to a subset of feature identifiers in the index set, wherein the subset of feature identifiers represents the input features and points to a subset of globally unique mapping identifiers in the index set, thereby providing first mapping information between the input features and the subset of globally unique mapping identifiers;
  
  (d) providing, by the one or more processors, second mapping information between at least some pre-existing features of a plurality of pre-existing feature sets in the database and at least some of the subset of globally unique mapping identifiers, wherein the input feature set and the plurality of pre-existing feature sets are obtained from different experiments, platforms, or organisms;
  
  (e) generating, by the one or more processors, an alignment scheme between the input feature set and the plurality of pre-existing feature sets in the database using the first mapping information and the second mapping information;
  
  (f) automatically correlating, by the one or more processors, the input feature set with the plurality of pre-existing feature sets in the database using the alignment scheme; and
  
  (g) automatically storing, by the one or more processors, the correlation information in (f) on a non-transitory machine readable medium for use in responding to queries involving feature sets.
- View Dependent Claims (2, 3, 4, 5, 6, 7)
- - 2. The method of claim 1, wherein the statistical information is selected from the group consisting of:
    - p-values or fold changes indicating differential expression or abundance of at least some features, values indicating responses of at least some features to a treatment or stimulus, values indicating an association of at least some features with a phenotypic characteristic, and any combination thereof.
  - 3. The method of claim 1, wherein the input feature set and the pre-existing feature sets each include a list of features and associated statistical information.
  - 4. The method of claim 3, wherein the statistical information is selected from the group consisting of:
    - p-values or fold changes indicating differential expression or abundance of at least some features, values indicating responses of at least some features to a treatment or stimulus, values indicating an association of at least some features with a phenotypic characteristic, and any combination thereof.
  - 5. The method of claim 1, wherein the two or more feature identifiers of (b) are related to each other by a sequence-based relationship.
  - 6. The method of claim 1, wherein the two or more feature identifiers of (b) are related to each other by a regulatory-based relationship.
  - 7. The method of claim 1, wherein the two or more feature identifiers of (b) are related to each other by a sequence-based relationship and a regulatory-based relationship.

8. A method, implemented using one or more computers comprising one or more processors and system memory, of conducting a query in a database comprising a plurality of feature sets or feature groups, each feature set comprising a plurality of features and associated statistical information and each feature group comprising a list of related features, the features being genes, SNPs, SNP patterns, portions of genes, regions of a genome, proteins, compounds, metabolites, or phenotypes, the method comprising:
- receiving, by one or more processors of the one or more computers, a query identifying one or more feature sets or feature groups in the plurality of feature sets or feature groups, wherein the query is received from a user input to a computer system, and wherein the plurality of feature sets or feature groups was obtained from different experiments, studies, platforms, or organisms;
  
  retrieving, by one or more processors of the one or more computers, precomputed correlation scores between the one or more feature sets or feature groups and other feature sets or feature groups in the database, wherein the precomputed correlation scores were computed by;
  
  (a) receiving, by one or more processors of the one or more computers, an input feature set comprising a data structure comprising a table comprising (i) a list of input features and (ii) a list of associated statistical information;
  
  (b) receiving, by one or more processors of the one or more computers, an index set comprising (i) a plurality of feature identifiers representing a plurality of features, and (ii) a plurality of globally unique mapping identifiers,whereineach feature identifier points to one or more globally unique mapping identifiers,two or more feature identifiers of the plurality of feature identifiers point to a same globally unique mapping identifier, the two or more feature identifiers are related to each other by at least one of;
  
  nomenclature-based, sequence-based, activity-based, regulatory-based, function-based, or structure-based relationships, andeach globally unique mapping identifier has a unique address in the index set;
  
  (c) automatically mapping, by one or more processors of the one or more computers, the input features in the input feature set to a subset of feature identifiers in the index set, wherein the subset of feature identifiers represents the input features and points to a subset of globally unique mapping identifiers in the index set, thereby providing first mapping information between the input features and the subset of globally unique mapping identifiers;
  
  (d) providing, by one or more processors of the one or more computers, second mapping information between at least some pre-existing features of a plurality of pre-existing feature sets or feature groups in the database and at least some of the subset of globally unique mapping identifiers, wherein the plurality of pre-existing feature sets or feature groups comprises the other feature sets or feature groups in the database;
  
  (e) generating, by one or more processors of the one or more computers, an alignment scheme between the input feature set and the plurality of pre-existing feature sets or feature groups in the database using the first mapping information and the second mapping information; and
  
  (f) correlating, by one or more processors of the one or more computers, the input feature set with the plurality of pre-existing feature sets or feature groups in the database using the alignment scheme to generate the precomputed correlation scores between the one or more feature sets or feature groups and the other feature sets or feature groups in the database;
  
  ranking, by one or more processors of the one or more computers, features, feature sets or feature groups using the precomputed correlation scores between the one or more feature sets or feature groups and the other feature sets or feature groups in the database; and
  
  outputting on a display device a ranked list of the features, feature sets or feature groups as determined by the ranking using the precomputed correlation scores.
- View Dependent Claims (9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21)
- - 9. The method of claim 8, further comprising receiving a field of search limiting content of the database against which the one or more feature sets or feature groups identified in the query are compared.
  - 10. The method of claim 8, wherein at least one identified feature group is queried across one or more feature sets and the user is presented with a ranked list of feature sets.
  - 11. The method of claim 8, wherein at least one identified feature set is queried across one or more feature sets and the user is presented with a ranked list of feature sets.
  - 12. The method of claim 8, wherein at least one identified feature set is queried across one or more feature groups and the user is presented with a ranked list of feature groups.
  - 13. The method of claim 8, wherein at least one identified feature set is queried across one or more features and the user is presented with a ranked list of features.
  - 14. The method of claim 8, wherein the query identifying one or more of said feature sets or feature groups is received via a user interface having regions for (a) inputting or selecting content for query, and (b) limiting a field of search within the database.
  - 15. The method of claim 8, wherein the features comprise genes of an organism.
  - 16. The method of claim 8, wherein the features comprise chemical compounds.
  - 17. The method of claim 8, wherein the features comprise SNPs.
  - 18. The method of claim 8, wherein the precomputed correlation scores are generated by performing a rank-based statistical algorithm.
  - 19. The method of claim 8, wherein the associated statistical information is selected from the group consisting of:
    - p-values or fold changes indicating differential expression or abundance of at least some features, values indicating responses of at least some features to a treatment or stimulus, values indicating an association of at least some features with a phenotypic characteristic, and any combination thereof.
  - 20. The method of claim 8, wherein the database comprises a scoring table of the precomputed correlation scores, which are provided between each feature set or feature group and all other feature sets in the database.
  - 21. The method of claim 8, wherein:
    - the mapping information that associate features of the one or more feature sets or feature groups with features of the other feature sets or feature groups in the database comprises a plurality of globally unique mapping identifiers, andeach globally unique mapping identifier represents a globally unique feature in the database.

Specification

Resources

Litigation Campaign Assessment

Current Assignee
Illumina Incorporated
Original Assignee
NextBio
Inventors
Kupershmidt, Ilya, Su, Qiaojuan Jane, Andry, Francois
Primary Examiner(s)
Huang, Miranda M
Assistant Examiner(s)
Lamardo, Viker A

Application Number

US13/588,526
Publication Number

US 20130166599A1
Time in Patent Office

2,447 Days
Field of Search

706 60
US Class Current
CPC Class Codes

G06F 16/21 Design, administration or m...

G06N 5/02 Knowledge representation; S...

System and method for scientific information knowledge management

First Claim

2 Assignments

0 Petitions

Accused Products

Abstract

Citations

21 Claims

Specification

Solutions

Use Cases

Quick Links

System and method for scientific information knowledge management

First Claim

2 Assignments

Subscription Required

Subscription Required

0 Petitions

Subscription Required

Accused Products

Subscription Required

Abstract

Citations

21 Claims

Specification

Subscription Required

Solutions

Use Cases

Quick Links