System and method for determining matching patterns within gene expression data
First Claim
1. In a computer network, a method for determining patterns within gene expression data stored in a database containing biological data, the method comprising:
- (a) defining a plurality of sample nodes within the database, each sample node comprising a curated data set comprising a set of pre-formatted and pre-computed biological data obtained from at least one biological sample, wherein the plurality of sample nodes are organized in a hierarchical arrangement according to clinical relevance;
(b) assigning a set of clinical attributes to each sample node, the set of clinical attributes including at least one taxonomy designation selected from the group consisting of tissues, diseases, medications and sample parameters;
(c) providing a user interface for entry of a search query into the computer processor and displaying search results at a user interface;
(d) prompting entry of the search query by requesting user selection of a search category from the group consisting of biological materials, biological material family, biological pathways, and sample set taxonomy, and wherein each sample node of the plurality of sample nodes is associated with a plurality of search categories;
(e) searching the plurality of sample nodes for data responsive to the search query;
(f) selecting one or more sample nodes containing the data responsive to the search query;
(g) saving search results comprising the set of pre-formatted and pre-computed biological data responsive to the one or more selected sample nodes;
(h) receiving a user interface selection of an algorithm for performing gene expression pattern matching for identifying genes or gene fragments within the one or more selected sample nodes that have similar gene expression patterns to a gene of interest, the algorithm comprising;
(i) computing a plurality of pairwise comparisons between the gene of interest and the genes or gene fragments within the one or more sample nodes, wherein each comparison is encoded using a qualitative three-state encoding scheme, wherein up-regulation of gene expression in the gene of interest relative to the genes or gene fragments within the one or more sample nodes is assigned a first symbol, down-regulation of gene expression in the gene of interest relative to the genes or gene fragments within the one or more sample nodes is assigned a second symbol different from the first symbol and no change in gene expression in the gene of interest relative to the genes or gene fragments within the one or more sample nodes is assigned a third symbol different from the first and second symbols wherein the three-state encoding scheme comprises a non-quantitative indication of gene behavior;
(ii) generating a three-by-three contingency matrix for each pairwise comparison using the three-state encoding scheme;
(iii) determining a distance score for each pairwise comparison;
(iv) generating a listing of lowest distance scores, wherein the lowest distance scores correspond to genes or gene fragments having the highest similarity to the gene of interest; and
(i) generating an output display comprising the listing of genes or gene fragments having the lowest distance scores.
2 Assignments
0 Petitions
Accused Products
Abstract
A computer-based system and method are provided for retrieving information from a number of data sources on a computer network containing biological data. The network database is organized in a b-tree configuration having a plurality of sample nodes. Each sample node includes a curated data set of pre-formatted and pre-computed summary biological data obtained from at least one biological sample. The plurality of sample nodes are organized in a hierarchical arrangement according to clinical relevance. A set of attributes is assigned to each sample node to facilitate navigation through the database using a browser accessible through a graphical user interface. The set of attributes including at least one taxonomy designation selected from the group including tissues, diseases, medications and sample parameters. Search results that are produced include automated reports of the summary biological data stored in the sample nodes and custom reports generated using the summary biological data.
-
Citations
49 Claims
-
1. In a computer network, a method for determining patterns within gene expression data stored in a database containing biological data, the method comprising:
-
(a) defining a plurality of sample nodes within the database, each sample node comprising a curated data set comprising a set of pre-formatted and pre-computed biological data obtained from at least one biological sample, wherein the plurality of sample nodes are organized in a hierarchical arrangement according to clinical relevance; (b) assigning a set of clinical attributes to each sample node, the set of clinical attributes including at least one taxonomy designation selected from the group consisting of tissues, diseases, medications and sample parameters; (c) providing a user interface for entry of a search query into the computer processor and displaying search results at a user interface; (d) prompting entry of the search query by requesting user selection of a search category from the group consisting of biological materials, biological material family, biological pathways, and sample set taxonomy, and wherein each sample node of the plurality of sample nodes is associated with a plurality of search categories; (e) searching the plurality of sample nodes for data responsive to the search query; (f) selecting one or more sample nodes containing the data responsive to the search query; (g) saving search results comprising the set of pre-formatted and pre-computed biological data responsive to the one or more selected sample nodes; (h) receiving a user interface selection of an algorithm for performing gene expression pattern matching for identifying genes or gene fragments within the one or more selected sample nodes that have similar gene expression patterns to a gene of interest, the algorithm comprising; (i) computing a plurality of pairwise comparisons between the gene of interest and the genes or gene fragments within the one or more sample nodes, wherein each comparison is encoded using a qualitative three-state encoding scheme, wherein up-regulation of gene expression in the gene of interest relative to the genes or gene fragments within the one or more sample nodes is assigned a first symbol, down-regulation of gene expression in the gene of interest relative to the genes or gene fragments within the one or more sample nodes is assigned a second symbol different from the first symbol and no change in gene expression in the gene of interest relative to the genes or gene fragments within the one or more sample nodes is assigned a third symbol different from the first and second symbols wherein the three-state encoding scheme comprises a non-quantitative indication of gene behavior; (ii) generating a three-by-three contingency matrix for each pairwise comparison using the three-state encoding scheme; (iii) determining a distance score for each pairwise comparison; (iv) generating a listing of lowest distance scores, wherein the lowest distance scores correspond to genes or gene fragments having the highest similarity to the gene of interest; and (i) generating an output display comprising the listing of genes or gene fragments having the lowest distance scores. - View Dependent Claims (2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25)
-
-
26. A network-based system for determining patterns within gene expression data stored in a database containing biological data, the system comprising:
-
a processor including a search engine; a database comprising; a plurality of sample nodes, each sample node comprising a curated data set comprising a set of pre-formatted and pre-computed biological data obtained from at least one biological sample, wherein the plurality of sample nodes are organized in a hierarchical arrangement according to clinical relevance; a set of clinical attributes assigned to each sample node, the set of clinical attributes including at least one taxonomy designation selected from the group consisting of tissues, diseases, medications and sample parameters; a user interface to enter a search query and display search results, wherein the search query comprises an instruction to the search engine to search a category from the group consisting of biological materials, biological material families, biological pathways, and sample set taxonomy, and wherein each sample node of the plurality of sample nodes is associated with a plurality of search categories; and a tracking module for storing data corresponding to a plurality of user selected sample nodes and further comprising means for receiving a user interface selection of an algorithm to be executed by the processor for performing gene expression pattern matching for identifying genes or gene fragments within the user selected sample nodes that have similar gene expression patterns to a gene of interest, wherein the algorithm comprises the steps of; (i) computing a plurality of pairwise comparisons between the gene of interest and the genes or gene fragments within the user selected sample nodes, wherein each comparison is encoded using a qualitative three-state encoding scheme, wherein up-regulation of gene expression in the gene of interest relative to the genes or gene fragments within the one or more sample nodes is assigned a first symbol, down-regulation of gene expression in the gene of interest relative to the genes or gene fragments within the one or more sample nodes is assigned a second symbol different from the first symbol and no change in gene expression in the gene of interest relative to the genes or gene fragments within the one or more sample nodes is assigned a third symbol different from the first and second symbols wherein the three-state encoding scheme comprises a non-quantitative indication of gene behavior; (ii) generating a three-by-three contingency matrix for each pairwise comparison using the three-state encoding scheme; (iii) determining a distance score for each pairwise comparison; (iv) generating a listing of lowest distance scores, wherein the lowest distance scores correspond to genes or gene fragments having the highest similarity to the gene of interest; and (v) displaying the listing of the lowest distance scores and the corresponding genes or gene fragments. - View Dependent Claims (27, 28, 29, 30, 31, 32, 33, 34, 35, 36, 37, 38, 39, 40, 41, 42, 43, 44, 45, 46, 47, 48, 49)
-
Specification