Sequence-centric scientific information management
First Claim
1. A computer program product comprising a machine readable non-transitory medium on which is provided program instructions for integrating an instant sequence-centric feature set into a database on a storage device comprising sequence-centric feature sets, the program instructions comprising:
- receiving the instant sequence-centric feature set comprising a plurality of sequence regions and associated statistics, wherein each sequence region comprises a genomic sequence or a genomic region;
mapping the plurality of sequence regions of the instant sequence-centric feature set to other sequence regions within the database to provide a set of mapped sequence regions for the instant sequence-centric feature set, wherein the plurality of sequence regions and the other sequence regions within the database are related by genomic coordinate, physical proximity, haplotype, function, or phenotype;
determine ranks of the set of mapped sequence regions in the received sequence-centric feature set and other sequence-centric feature sets in the database, wherein each feature set of the other sequence-centric feature sets comprises a plurality of sequence regions and associated statistics, and wherein the ranks are based on statistics associated with the set of mapped sequence regions;
calculating sequence-sequence scores indicating correlations between the instant sequence-centric feature set and the sequence-centric feature sets in the database using the ranks of the set of mapped sequence regions; and
storing the instant sequence-centric feature set and the sequence-sequence scores in the database on the storage device.
2 Assignments
0 Petitions
Accused Products
Abstract
According to various embodiments, aspects of the invention provide a highly efficient meta-analysis infrastructure for performing research queries across a large number of studies and experiments from diverse sequencing technologies as well as different biological and chemical assays, data types and organisms, as well as systems to build and add to such an infrastructure. The methods, systems and apparatuses described enable combining orthogonal types of data and available public knowledge to elucidate mechanisms governing normal development, disease progression, as well as susceptibility of individuals to disease or response to drug treatments.
86 Citations
16 Claims
-
1. A computer program product comprising a machine readable non-transitory medium on which is provided program instructions for integrating an instant sequence-centric feature set into a database on a storage device comprising sequence-centric feature sets, the program instructions comprising:
-
receiving the instant sequence-centric feature set comprising a plurality of sequence regions and associated statistics, wherein each sequence region comprises a genomic sequence or a genomic region; mapping the plurality of sequence regions of the instant sequence-centric feature set to other sequence regions within the database to provide a set of mapped sequence regions for the instant sequence-centric feature set, wherein the plurality of sequence regions and the other sequence regions within the database are related by genomic coordinate, physical proximity, haplotype, function, or phenotype; determine ranks of the set of mapped sequence regions in the received sequence-centric feature set and other sequence-centric feature sets in the database, wherein each feature set of the other sequence-centric feature sets comprises a plurality of sequence regions and associated statistics, and wherein the ranks are based on statistics associated with the set of mapped sequence regions; calculating sequence-sequence scores indicating correlations between the instant sequence-centric feature set and the sequence-centric feature sets in the database using the ranks of the set of mapped sequence regions; and storing the instant sequence-centric feature set and the sequence-sequence scores in the database on the storage device. - View Dependent Claims (2)
-
-
3. A system for integrating an instant sequence-centric feature set into a database comprising sequence-centric feature sets, comprising:
-
a memory for storing a database of scientific information; and one or more processors in communication with the memory and configured to; receive the instant sequence-centric feature set comprising a plurality of sequence regions and associated statistics, wherein each sequence region comprises a genomic sequence or a genomic region; map the plurality of sequence regions of the instant sequence-centric feature set to other sequence regions within the database to provide a set of mapped sequence regions for the instant sequence-centric feature set, wherein the plurality of sequence regions and the other sequence regions within the database are related by genomic coordinate, physical proximity, haplotype, function, or phenotype; determine ranks of the set of mapped sequence regions in the received sequence-centric feature set and other sequence-centric feature sets in the database, wherein each feature set of the other sequence-centric feature sets comprises a plurality of sequence regions and associated statistics, and wherein the ranks are based on statistics associated with the set of mapped sequence regions; calculate sequence-sequence scores indicating correlations between the instant sequence-centric feature set and the sequence-centric feature sets in the database using the ranks of the set of mapped sequence regions; and store the instant sequence-centric feature set and the sequence-sequence scores in the database on the memory. - View Dependent Claims (4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16)
-
Specification