Browsable database for biological use
First Claim
1. A browsable database system for use with biological information, comprising;
- at least one datastore of biological sequence data, including at least one of gene sequence data and protein sequence data;
an ontology of categories of biological functions mapped to statistical models trained on families of biological sequences related to the biological functions;
an input receptive of at least one user selection indicating a biological function of said ontology;
a recognizer adapted to identify multiple alignments of biological sequence data based on said sequence datastore and a statistical model related to a function indicated by the user selection; and
an output adapted to communicate the multiple alignments to a user providing the user selection.
3 Assignments
0 Petitions
Accused Products
Abstract
The browsable database can allow for high-throughput analysis of protein sequences. One helpful feature may be a simplified ontology of protein function, which allows browsing of the database by biological functions. Biologist curators may have associated the ontology terms with Hidden Markov Models (HMMs), rather than individual sequences, so that they can be applied to additional sequences. To ensure accurate functional classification, HMMs may be constructed not only for families, but for curator-defined subfamilies, whenever family members have divergent functions or nomenclature. Multiple sequence alignments and phylogenetic trees, including curator-assigned information, can be available for each family. Various versions of the browsable database may include training sequences from all organisms in the GenBank non-redundant protein database, and the HMMs can be used to classify gene products across the entire genomes of human, and Drosophila melanogaster.
-
Citations
30 Claims
-
1. A browsable database system for use with biological information, comprising;
-
at least one datastore of biological sequence data, including at least one of gene sequence data and protein sequence data;
an ontology of categories of biological functions mapped to statistical models trained on families of biological sequences related to the biological functions;
an input receptive of at least one user selection indicating a biological function of said ontology;
a recognizer adapted to identify multiple alignments of biological sequence data based on said sequence datastore and a statistical model related to a function indicated by the user selection; and
an output adapted to communicate the multiple alignments to a user providing the user selection. - View Dependent Claims (2, 3, 4, 5, 6, 7, 8, 9, 10)
-
-
11. A method of operation for use with a browsable biological database, comprising;
-
communicating an ontology of categories of biological functions to a user, wherein the biological functions are mapped to statistical models trained on families of biological sequences related to the biological functions;
receiving at least one user selection indicating a biological function of the ontology;
accessing at least one sequence datastore of biological sequence data, including at least one of gene sequence data and protein sequence data;
employing pattern recognition to identify multiple alignments of biological sequence data based on contents of the sequence datastore and a statistical model related to a function indicated by the user selection; and
communicating the multiple alignments to the user providing the user selection. - View Dependent Claims (12, 13, 14, 15, 16, 17, 18, 19, 20)
-
-
21. A method for constructing a browsable database for use with biological information, comprising:
-
clustering biological sequences into families based on global sequence similarity, wherein the biological sequences include at least one of protein sequences and gene sequences;
aligning the families by generating statistical models based on biological sequence clusters associated with the families; and
dividing the families into subfamilies of sequences sharing a common functional attribute, including at least one of molecular function and biological process. - View Dependent Claims (22, 23, 24, 25, 26, 27, 28, 29, 30)
-
Specification