System and method for storage and analysis of gene expression data
First Claim
1. A method for analyzing gene expression data, the method comprising:
- (a) organizing data pertaining to a plurality of samples into a b-tree comprising a plurality of levels, each level comprising a plurality of leaf nodes;
(b) defining a plurality of attributes for filtering the data at each level of the b-tree;
(c) distributing the data among the plurality of leaf nodes according to the plurality of attributes;
(d) grouping the leaf nodes according to their corresponding attributes;
(e) defining a control sample set and an experimental sample set;
(f) performing a t-test comparing the experimental sample set with the control sample set; and
(g) generating a table of t-test results.
2 Assignments
0 Petitions
Accused Products
Abstract
In a computer system for analysis of gene expression data, a gene expression database is organized in a hierarchical b-tree according to the descriptive and clinical sample attributes stored in the database. A user submits a query for searching the database and defines attributes on which to filter for each level of the b-tree. A simple search can be employed to arbitrarily group together leaf nodes depending on their attributes. The grouped leaf nodes are used as “control” and “experimental” sample sets. A t-test can be performed to test for statistically significant regulation between the control and experimental sample sets. In one embodiment, the results of the b-tree analysis are organized as a table of information which may be part of a relational database. The data in the database are encoded according to a three-state scheme based on regulation behavior. A similarity search algorithm can be performed on the encoded data to identify genes or gene fragments that show regulation profiles similar to the query gene or gene fragment, ranking the genes according to the level of similarity.
-
Citations
36 Claims
-
1. A method for analyzing gene expression data, the method comprising:
-
(a) organizing data pertaining to a plurality of samples into a b-tree comprising a plurality of levels, each level comprising a plurality of leaf nodes;
(b) defining a plurality of attributes for filtering the data at each level of the b-tree;
(c) distributing the data among the plurality of leaf nodes according to the plurality of attributes;
(d) grouping the leaf nodes according to their corresponding attributes;
(e) defining a control sample set and an experimental sample set;
(f) performing a t-test comparing the experimental sample set with the control sample set; and
(g) generating a table of t-test results. - View Dependent Claims (2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17)
-
-
18. A computerized storage and retrieval system of biological information comprising:
a stored database containing records pertaining to a plurality of gene regulation events for each of a plurality of control samples and experimental samples, wherein the database comprises a plurality of relational tables, each relational table having a means for linking to at least one other relational table of the plurality;
the plurality of relational tables comprising;
a first table of the plurality of relational tables comprising a plurality of gene regulation event categories into which at least some of gene regulation events are grouped, the first table comprising results of a plurality of comparisons of selected control samples and selected experimental samples, wherein the selected control samples and selected experimental samples are results of a b-tree analysis;
at least one second table of the plurality of relational tables comprising a plurality of labels for associating the plurality of comparisons with descriptions of the comparisons, and a user interface allowing a user to selectively view information regarding the plurality of gene regulation events. - View Dependent Claims (19, 20, 21, 22, 23, 24, 25)
-
26. A database system having a plurality of internal records, the database comprising:
-
a first plurality of records specifying gene regulation events for a plurality of samples;
a second plurality of records specifying comparison results from comparison sets comprising selected experiment sample sets and selected control sample sets, wherein a first portion of the plurality of samples is designated sample control sets and a second portion of the plurality of samples is designated experiment sample sets, wherein the comparison results are derived from b-tree analysis of the comparison sets;
a third plurality of records specifying comparison context comprising data describing how a comparison set was selected and analyzed; and
a fourth plurality of records comprising a plurality of links for associating the first, second and third plurality of records. - View Dependent Claims (27, 28, 29, 30, 31, 32, 33, 34, 35, 36)
-
Specification