Method and systems for querying sequence-centric scientific information
First Claim
1. A computer implemented method of conducting a query in a database comprising a plurality of sequence-centric feature sets and gene-centric feature sets, the sequence-centric feature sets comprising genomic sequence regions and associated statistics based on experiments on samples containing the genomic sequence regions and the gene-centric feature sets comprising genes and associated statistics based on experiments on samples containing the genes, the method comprising:
- (a) receiving, at a server computer comprising a processor and system memory, a query identifying a sequence-centric feature set or a gene-centric feature set, wherein the query is received from a user input to a query computer;
(b) receiving, by the server computer, correlation scores indicating correlations between the queried sequence-centric feature set or the queried gene-centric feature set and two or more feature sets in the database, wherein each correlation score was calculated by an iterative rank-based process comprising;
obtaining a first feature set in the database corresponding to the queried sequence-centric feature set or the queried gene-centric feature set,obtaining a second feature set in the database,mapping one or more features in the first feature set to one or more features in the second feature set to provide mapping information between the first and second feature sets,obtaining ranks of the one or more features in the first feature set and ranks of the one or more features in the second feature set, andcalculating the correlation score using (i) the mapping information between the first and second feature sets, and (ii) the ranks of the one or more features in the first feature set and the ranks of the one or more features in the second feature set, the correlation score indicating a correlation between the first feature set and the second feature set;
(c) calculating, by the server computer, feature set rankings for the two or more feature sets based on the correlation scores; and
(d) outputting, based on the feature set rankings and by the server computer, a ranked list of the two or more feature sets to be presented to the user.
2 Assignments
0 Petitions
Accused Products
Abstract
According to various embodiments, aspects of the invention provide a highly efficient meta-analysis infrastructure for performing research queries across a large number of studies and experiments from diverse sequencing technologies as well as different biological and chemical assays, data types and organisms, as well as systems to build and add to such an infrastructure. The methods, systems and apparatuses described enable combining orthogonal types of data and available public knowledge to elucidate mechanisms governing normal development, disease progression, as well as susceptibility of individuals to disease or response to drug treatments.
83 Citations
19 Claims
-
1. A computer implemented method of conducting a query in a database comprising a plurality of sequence-centric feature sets and gene-centric feature sets, the sequence-centric feature sets comprising genomic sequence regions and associated statistics based on experiments on samples containing the genomic sequence regions and the gene-centric feature sets comprising genes and associated statistics based on experiments on samples containing the genes, the method comprising:
-
(a) receiving, at a server computer comprising a processor and system memory, a query identifying a sequence-centric feature set or a gene-centric feature set, wherein the query is received from a user input to a query computer; (b) receiving, by the server computer, correlation scores indicating correlations between the queried sequence-centric feature set or the queried gene-centric feature set and two or more feature sets in the database, wherein each correlation score was calculated by an iterative rank-based process comprising; obtaining a first feature set in the database corresponding to the queried sequence-centric feature set or the queried gene-centric feature set, obtaining a second feature set in the database, mapping one or more features in the first feature set to one or more features in the second feature set to provide mapping information between the first and second feature sets, obtaining ranks of the one or more features in the first feature set and ranks of the one or more features in the second feature set, and calculating the correlation score using (i) the mapping information between the first and second feature sets, and (ii) the ranks of the one or more features in the first feature set and the ranks of the one or more features in the second feature set, the correlation score indicating a correlation between the first feature set and the second feature set; (c) calculating, by the server computer, feature set rankings for the two or more feature sets based on the correlation scores; and (d) outputting, based on the feature set rankings and by the server computer, a ranked list of the two or more feature sets to be presented to the user. - View Dependent Claims (2, 3, 15, 16, 17, 18, 19)
-
-
4. A system comprising:
-
a memory for storing a database comprising a plurality of feature sets, wherein each feature set of the plurality of features sets is either a sequence-centric feature sets or a gene-centric feature set, wherein each sequence-centric feature set comprises a plurality of sequence regions and associated statistics, and each gene-centric feature set comprises a plurality of genes and associated statistics; and one or more processors in communication with the memory, wherein the one or more processors are configured to; (a) receive a query identifying a sequence-centric feature set or a gene-centric feature set, wherein the query is received from a user input to a query computer; (b) receive, in response to said query, correlation scores indicating correlations between the queried sequence-centric feature set or the queried gene-centric feature set and two or more feature sets in the database, wherein each correlation score was calculated by an iterative rank-based process comprising; obtaining a first feature set in the database corresponding to the queried sequence-centric feature set or the queried gene-centric feature set, obtaining a second feature set in the database, mapping one or more features in the first feature set to one or more features in the second feature set to provide mapping information between the first and second feature sets, obtaining ranks of the one or more features in the first feature set and ranks of the one or more features in the second feature set, and calculating the correlation score using (i) the mapping information between the first and second feature sets, and (ii) the ranks of the one or more features in the first feature set and the ranks of the one or more features in the second feature set, the correlation score indicating a correlation between the first feature set and the second feature set; (c) calculate feature set rankings for the two or more feature sets based on the correlation scores; and (d) output, based on the feature set rankings, a ranked list of the two or more feature sets to be presented to the user. - View Dependent Claims (5, 6, 7, 8, 9, 10, 11, 12, 13)
-
-
14. A computer program product comprising a non-transitory machine readable medium storing program code that, when executed by one or more processors of a computer system, causes the computer system to implement a method of querying a sequence-centric feature set or a gene-centric feature set against a database comprising a plurality of feature sets, wherein each feature set of the plurality of features sets is either a sequence-centric feature sets or a gene-centric feature set, each sequence-centric feature set comprises a plurality of sequence regions and associated statistics, and each gene-centric feature set comprises a plurality of genes and associated statistics, said program code comprising:
-
(a) code for receiving a query identifying a sequence-centric feature set or a gene-centric feature set, wherein the query is received from a user input to a query computer; (b) code for correlating, in reply to said query, the queried sequence-centric feature set or the queried gene-centric feature set with two or more feature sets of the plurality of feature sets in the database, wherein the correlating comprises; obtaining a first feature set in the database corresponding to the queried sequence-centric feature set or the queried gene-centric feature set, obtaining a second feature set in the database, mapping one or more features in the first feature set to one or more features in the second feature set to provide mapping information between the first and second feature sets, obtaining ranks of the one or more features in the first feature set and ranks of the one or more features in the second feature set, and calculate the correlation score using (i) the mapping information between the first and second feature sets, and (ii) the ranks of the one or more features in the first feature set and the ranks of the one or more features in the second feature set, the correlation score indicating a correlation between the first feature set and the second feature set; (c) code for calculating feature set rankings for the two or more feature sets based on the correlation scores; and (d) code for outputting, based on the feature set rankings, a ranked list of the two or more feature sets to be presented to the user.
-
Specification